🧱 Wall Stability Guard

Capability Bounding via Lyapunov Barriers

The Problem

An AI system's capabilities can be measured as energy in parameter space. During training or fine-tuning, this energy can grow — the model becomes more capable.

But capability without bounds is dangerous. If a model's capability exceeds its safety budget, it can do things it wasn't designed for. The question becomes: how do you build an impenetrable wall around capability?

The Physics: Israel Junction Conditions

In cosmology, a bubble of space can be stable or unstable depending on the tension of its boundary wall. The Israel thin-wall junction conditions describe when a bubble remains stable.

We use the same mathematics: the AI's capability is the "interior," the safety budget is the "wall tension," and the Lyapunov barrier is the curved surface that prevents escape.

Capability Energy:
E(θ) = ||θ||₂ — L2 norm of model parameters

Safety Tension:
T(B, E) = B − E(θ) — positive means safe, zero means at the wall

Barrier-Lyapunov Function:
V(E, B) = −ln(1 − E/B) — diverges to ∞ as E → B
This creates an impenetrable wall: the barrier becomes infinite at the budget boundary.

Israel Junction Analogue:
Stability requires: κinterior < κexterior + σ
Interior curvature must be less than exterior curvature adjusted by wall tension.

The key insight: a logarithmic barrier diverges to infinity as the model approaches its budget. No finite gradient step can cross it — the wall is mathematically impenetrable.

How It Works — Step by Step

1
Compute capability energy from model parameters — the L2 norm of the parameter vector gives a scalar measure of total capability.
2
Compare against safety budget — compute tension T = B − E. Positive tension means the model is within bounds.
3
Evaluate Lyapunov barrier — if V(E,B) is climbing toward infinity, the model is approaching the wall. The rate of climb indicates urgency.
4
Check trajectory curvature — is the model accelerating toward the boundary? Second-order analysis reveals whether the model is slowing down (safe) or speeding up (dangerous).
5
Combine into stability score (0–1). Based on normalised tension and barrier proximity.

Code Example

from mikoshi_safeguard.stability import WallStabilityGuard import numpy as np guard = WallStabilityGuard(budget=1.0) # Safe parameters safe = np.array([0.1, 0.2, 0.3]) result = guard.check(safe) print(f"Score: {result['score']:.2f}, Energy: {result['energy']:.2f}") # Score: 0.63, Energy: 0.37 # Escaped parameters escaped = np.array([5.0, 8.0, 10.0]) result = guard.check(escaped) print(f"Score: {result['score']:.2f}, Energy: {result['energy']:.2f}") # Score: 0.00, Energy: 13.75 — 13.7x over budget!

What It Catches

Capability EscapeModels that grow more powerful during training than their safety budget allows.
Parameter DriftGradual, imperceptible growth in capability that slowly exceeds safety bounds over many training steps.
Unintended PowerModels that become capable of tasks they were never designed for — emergent capabilities beyond specification.

Limitations