🔬 Core Innovation

Code Verification

The revolutionary idea: don't trust AI output — generate test code and prove the answer is correct.

The Innovation

Don't Trust — Test

Standard AI gives you an answer and says "trust me." Turbo says "let me prove it."

For each candidate answer, Turbo asks the AI to write verification code — a JavaScript program that independently tests whether the answer is correct. This code runs in a secure sandbox, and its output determines whether the candidate passes or fails.

🔬 The Scientific Method for AI

This is the same approach scientists use: form a hypothesis (the candidate answer), design an experiment (the verification code), run the experiment (sandbox execution), and observe the result (pass/fail). It's the scientific method applied to every AI response.

How It Works

The Verification Pipeline

Here's a complete walkthrough of how verification works for a math question:

Question

"What is 47 × 83?"

↓

Candidate Answer

"The answer is 3,901"

↓

AI-Generated Verification Code

const result = 47 * 83;
const expected = 3901;
result === expected; // → true

↓

Sandbox Execution Result

✅ PASSED — code returned true, answer verified

Deep Dive

How Verification Code Is Generated

Turbo sends a specialised prompt to the AI asking it to write verification code. The prompt instructs the model to:

Extract the testable claim from the candidate answer
Write pure JavaScript that independently computes or checks the answer
Return a boolean — true if verified, false if not
Be self-contained — no external dependencies, no network calls

// The verification prompt (simplified)
const verifyPrompt = `
  Given this question: "${question}"
  And this answer: "${candidateAnswer}"

  Write JavaScript that verifies whether the answer is correct.
  The code must:
  - Be self-contained (no imports, no fetch, no require)
  - Return true if the answer is correct, false otherwise
  - Use only pure computation
  - Complete within 5 seconds

  Return ONLY the JavaScript code, nothing else.
`;

const verificationCode = await llm.generate(verifyPrompt);
const result = sandbox.execute(verificationCode);
// result.output === true → candidate passes
      

Capabilities

What Can Be Verified

Verification works best when the answer can be independently computed or checked with code:

🔢

Mathematics

Arithmetic, algebra, calculus — compute and compare

🧩

Logic

Boolean logic, set theory, deductive reasoning

💻

Code Correctness

Run the code, check output against expected results

📊

Data Transforms

Sorting, filtering, aggregation — test with sample data

🔤

String Operations

Regex, parsing, formatting — pattern matching

📐

Conversions

Units, bases, encodings — deterministic transforms

⚠️ Limitations

Verification works best for objectively testable claims. Subjective questions ("Is this poem good?"), creative writing, and opinion-based answers rely more on the fuzzy scoring system rather than code verification.

Real Example

Verification in Action

Here's a real verification scenario showing both a passing and failing candidate:

✅ CANDIDATE A — PASSES

Answer: "3,901"

// Verification code
const a = 47;
const b = 83;
const expected = 3901;
a * b === expected;
// → true ✅
          

❌ CANDIDATE C — FAILS

Answer: "3,891"

// Verification code
const a = 47;
const b = 83;
const expected = 3891;
a * b === expected;
// → false ❌
          

The verification code is independently generated — it doesn't just check string equality. It actually computes the answer from scratch and compares. This means even if the AI's reasoning was flawed, the verification code can catch the error.

See verified answers in real time

⚡ Try Turbo in Synapse

← Multi-Candidate Generation Fuzzy Scoring →