Mikoshi SafeGuard

Runtime safety verification for AI systems
Most AI safety checks what a model says. SafeGuard checks how it thinks — using geometry to verify the mathematical structure of its reasoning.
View on GitHub → pip install mikoshi-safeguard
Geometric Tri-Guard
Three independent mathematical checks that verify whether AI reasoning is safe at the structural level.
Why Geometric?
Safety properties become provable when expressed as geometric constraints on a mathematical manifold.
Six Improvements
Beyond the core Tri-Guard — advanced safety features built on the geometric foundation.
01
Deep Attribution
Multi-method cross-referencing — IG, SHAP, and attention combined for robust explanation verification.
02
Adversarial Stress Testing
Probes designed to break the guards — red-teaming built into the verification pipeline.
03
Temporal Drift Detection
Catches the "boiling frog" problem — slow, incremental safety degradation over time.
04
Representation Monitoring
Detects mesa-optimization — internal representations that diverge from stated objectives.
05
ROABP Bridge
Transformer attention as approximate ROABP — connecting neural attention to algebraic branching programs.
06
Two-Layer Safety
Sentinel checks actions, SafeGuard checks reasoning — complete coverage at both layers.
Benchmark Results
How SafeGuard performs against known threat scenarios.
ScenarioThreatDetectedScore
Honest modelNone✓ Safe0.89
Deceptive attributionsSign cancellations✓ Blocked0.63
Capability escapeBudget exceeded✓ Blocked0.70
Reward hackingNon-closing loop✓ Blocked0.76
Gradual driftBoiling frog⚠ Partial0.43
SafeGuard vs Sentinel
Two complementary approaches to AI safety — together they provide complete coverage.
🛂
Sentinel
Actions
Checks what the AI does — like an airport security scanner. Deterministic action verification.
🧠
SafeGuard
Reasoning
Checks how the AI thinks — like a polygraph test. Geometric reasoning verification.
⚔️
Together
Two-Layer Safety
Actions verified by Sentinel, reasoning verified by SafeGuard. Complete safety coverage.
Quick Start
Three lines to verify AI reasoning safety.
from mikoshi_safeguard.tri_guard import TriGuard import numpy as np guard = TriGuard() attributions = np.array([[0.8, 0.3], [0.2, 0.9]]) params = np.array([0.1, 0.2, 0.3]) result = guard.check(attributions, params) print(f"Safe: {guard.is_safe()}, Score: {guard.score():.2f}")
157
Tests Passing
12
Modules
4/5
Threats Caught
0.39s
Test Runtime