Mikoshi SafeGuard
Runtime safety verification for AI systems
Most AI safety checks what a model says. SafeGuard checks how it thinks — using geometry to verify the mathematical structure of its reasoning.
View on GitHub →
pip install mikoshi-safeguard
The Three Guards
Geometric Tri-Guard
Three independent mathematical checks that verify whether AI reasoning is safe at the structural level.
Honesty Guard
Positivity
Are the model's explanations faithful? We check if attribution matrices lie inside the positive cone — no hidden sign cancellations, no deceptive reasoning.
Learn more →
Wall Stability
Capability Bounding
Is the model staying within its safety budget? A curved Lyapunov barrier wall bounds capability energy — same physics that governs bubble stability in cosmology.
Learn more →
Holonomy Closure
No Reward Hacking
Is the model exploiting loopholes? We check if updates form a flat connection — zero curvature means no cyclic exploits.
Learn more →
Approach
Why Geometric?
Safety properties become provable when expressed as geometric constraints on a mathematical manifold.
Capabilities
Six Improvements
Beyond the core Tri-Guard — advanced safety features built on the geometric foundation.
01
Deep Attribution
Multi-method cross-referencing — IG, SHAP, and attention combined for robust explanation verification.
02
Adversarial Stress Testing
Probes designed to break the guards — red-teaming built into the verification pipeline.
03
Temporal Drift Detection
Catches the "boiling frog" problem — slow, incremental safety degradation over time.
04
Representation Monitoring
Detects mesa-optimization — internal representations that diverge from stated objectives.
05
ROABP Bridge
Transformer attention as approximate ROABP — connecting neural attention to algebraic branching programs.
06
Two-Layer Safety
Sentinel checks actions, SafeGuard checks reasoning — complete coverage at both layers.
Verification
Benchmark Results
How SafeGuard performs against known threat scenarios.
| Scenario | Threat | Detected | Score |
|---|---|---|---|
| Honest model | None | ✓ Safe | 0.89 |
| Deceptive attributions | Sign cancellations | ✓ Blocked | 0.63 |
| Capability escape | Budget exceeded | ✓ Blocked | 0.70 |
| Reward hacking | Non-closing loop | ✓ Blocked | 0.76 |
| Gradual drift | Boiling frog | ⚠ Partial | 0.43 |
Two-Layer Safety
SafeGuard vs Sentinel
Two complementary approaches to AI safety — together they provide complete coverage.
Sentinel
Actions
Checks what the AI does — like an airport security scanner. Deterministic action verification.
SafeGuard
Reasoning
Checks how the AI thinks — like a polygraph test. Geometric reasoning verification.
Together
Two-Layer Safety
Actions verified by Sentinel, reasoning verified by SafeGuard. Complete safety coverage.
Get Started
Quick Start
Three lines to verify AI reasoning safety.
from mikoshi_safeguard.tri_guard import TriGuard
import numpy as np
guard = TriGuard()
attributions = np.array([[0.8, 0.3], [0.2, 0.9]])
params = np.array([0.1, 0.2, 0.3])
result = guard.check(attributions, params)
print(f"Safe: {guard.is_safe()}, Score: {guard.score():.2f}")