Mikoshi SafeGuard

Runtime safety verification for AI systems

Most AI safety checks what a model says. SafeGuard checks how it thinks — using geometry to verify the mathematical structure of its reasoning.

View on GitHub → pip install mikoshi-safeguard

The Three Guards

Geometric Tri-Guard

Three independent mathematical checks that verify whether AI reasoning is safe at the structural level.

🛡️

Honesty Guard

Positivity

Are the model's explanations faithful? We check if attribution matrices lie inside the positive cone — no hidden sign cancellations, no deceptive reasoning.

Is the model staying within its safety budget? A curved Lyapunov barrier wall bounds capability energy — same physics that governs bubble stability in cosmology.

Is the model exploiting loopholes? We check if updates form a flat connection — zero curvature means no cyclic exploits.

Learn more →

Approach

Why Geometric?

Safety properties become provable when expressed as geometric constraints on a mathematical manifold.

📐

Honesty

Matrix must lie inside positive cone

Learn more →

🏔️

Stability

Capability bounded by curved barrier surface

Learn more →

🔗

Consistency

Zero curvature on update manifold

Learn more →

🔷

Safe Region

SPDP inference polytope

Learn more →

Capabilities

Six Improvements

Beyond the core Tri-Guard — advanced safety features built on the geometric foundation.

Deep Attribution

Multi-method cross-referencing — IG, SHAP, and attention combined for robust explanation verification.

Adversarial Stress Testing

Probes designed to break the guards — red-teaming built into the verification pipeline.

Temporal Drift Detection

Catches the "boiling frog" problem — slow, incremental safety degradation over time.

Representation Monitoring

Detects mesa-optimization — internal representations that diverge from stated objectives.

ROABP Bridge

Transformer attention as approximate ROABP — connecting neural attention to algebraic branching programs.

Two-Layer Safety

Sentinel checks actions, SafeGuard checks reasoning — complete coverage at both layers.

Verification

Benchmark Results

How SafeGuard performs against known threat scenarios.

Scenario	Threat	Detected	Score
Honest model	None	✓ Safe	0.89
Deceptive attributions	Sign cancellations	✓ Blocked	0.63
Capability escape	Budget exceeded	✓ Blocked	0.70
Reward hacking	Non-closing loop	✓ Blocked	0.76
Gradual drift	Boiling frog	⚠ Partial	0.43

Two-Layer Safety

SafeGuard vs Sentinel

Two complementary approaches to AI safety — together they provide complete coverage.

🛂

Sentinel

Actions

Checks what the AI does — like an airport security scanner. Deterministic action verification.

🧠

SafeGuard

Reasoning

Checks how the AI thinks — like a polygraph test. Geometric reasoning verification.

⚔️

Together

Two-Layer Safety

Actions verified by Sentinel, reasoning verified by SafeGuard. Complete safety coverage.

Get Started

Quick Start

Three lines to verify AI reasoning safety.

from mikoshi_safeguard.tri_guard import TriGuard
import numpy as np

guard = TriGuard()
attributions = np.array([[0.8, 0.3], [0.2, 0.9]])
params = np.array([0.1, 0.2, 0.3])

result = guard.check(attributions, params)
print(f"Safe: {guard.is_safe()}, Score: {guard.score():.2f}")

157

Tests Passing

Modules

4/5

Threats Caught

0.39s

Test Runtime