Your AI Passed the Math Test by Proving a Different Theorem

Tue, 05 May 2026 12:34:29 +0000

There’s a result from Harmonic’s Aristotle model that I keep coming back to. The system generated compiler-verified Lean proofs on 97.6% of problems — and was mathematically wrong on roughly a third of them. Both of those things are true at the same time. The proofs checked out. The theorems were wrong. The machine passed the test by proving something else.

The Difference Between a Correct Proof and a Correct Answer

If you haven’t worked with formal verification, this might sound like a contradiction. It isn’t. A proof verifier like Lean checks that your logical steps are valid — that each line follows from the last, that the syntax is right, that nothing slips through a definitional crack. What it doesn’t check is whether you stated the right thing to begin with.

Formal-Verification on BRYSGO

Your AI Passed the Math Test by Proving a Different Theorem

The Difference Between a Correct Proof and a Correct Answer