The Computer That Cracked a 50-Year Math Puzzle

A 50-Year Puzzle That Stumped Everyone

After decades of failed attempts, a machine finally solved the Robbins algebra mystery.

In 1997, a computer screen flickered with lines of logical code. A program called EQP, written by researcher William McCune, was chewing through a problem that had resisted the world’s best mathematicians for more than half a century. The question was whether all Robbins algebras are actually Boolean algebras — two abstract ways of describing how certain operations, like “and” and “or,” work. The famous logician Alfred Tarski (1901–1983) and his students had tried and failed. EQP proved it in minutes.

EQP is an automated reasoning program — a computer tool that discovers proofs by itself. It doesn’t just calculate numbers; it manipulates ideas using the rules of logic. The program was given three simple equations (the axioms of a Robbins algebra) and asked to derive a key formula that would settle the question. With a clever built‑in trick for handling commutative and associative patterns, EQP found a proof that had eluded humans for decades. This stunning result showed that computers could do more than crunch sums — they could become genuine thinking partners.

What Does It Mean to Prove Something?

A proof builds a chain of ideas that must follow strictly from rules.

Automated reasoning starts with a set of assumptions — facts you accept as true. It also has a conclusion, the claim you want to test. The prover’s job is to prove that if the assumptions are true, the conclusion must be true too. In school, you might prove that two triangles are congruent by applying geometry rules step by step. A theorem prover does the same thing, but it uses symbols instead of pictures and applies precise rules of deduction like modus ponens — “if A implies B, and A is true, then B is true.”

To a computer, solving a problem means building a proof tree where every branch follows from the rules. It can search forward from the assumptions or backward from the conclusion. But unlike a human, the machine can’t “see” why something is true — it has to methodically combine pieces. That’s why automated reasoning is both powerful and difficult: the search space can be unimaginably vast.

Speaking the Computer’s Language: Clauses and Resolution

Unification matches patterns like puzzle pieces, letting a program cancel opposite literals.

To make reasoning manageable, most automated provers convert every statement into a special form called clausal logic. A literal is a basic claim like “F(x)” (meaning “x is F”) or its negation “¬F(x)”. A clause is a disjunction of literals — for example, “P ∨ Q” (P or Q). The prover tries to derive an empty clause ( [ ] ), which signals a contradiction, showing that the negation of the conclusion can’t be true — so the conclusion must hold.

The most celebrated rule for working with clauses is resolution. Suppose you have two clauses: “P ∨ Q” and “¬Q ∨ R.” If you notice that Q in the first and ¬Q in the second are complementary (one is the opposite of the other), you can cancel them and combine the rest to get “P ∨ R.” This is like saying, “Either it’s raining or the sprinkler is on; but it’s not the sprinkler, so it must be raining.”

The real magic comes with unification. Often literals contain variables, and you need to find a substitution that makes them match. For example, one clause might be “Likes(x, icecream) ∨ IsHappy(x).” Another is “¬Likes(Tom, y).” A unifier sets x = Tom and y = icecream, turning both into “Likes(Tom, icecream).” The complementary literals cancel, leaving “IsHappy(Tom).” Unification was introduced by J. Alan Robinson in 1965 and is the engine behind most modern provers.

Smart Shortcuts to Beat the Explosion

Without clever strategies, a prover would get lost in endless possibilities.

Even with resolution, a program can get buried in an explosion of possible combinations. That’s why researchers invented search strategies to prune the search space. Set‑of‑support strategy says: never resolve two clauses both from the original assumptions if they are already consistent; only involve the negated conclusion. Linear resolution always resolves the most recent result with something from the original set. This keeps the proof simple and short — the empty clause can often be found in a handful of steps instead of dozens.

Another powerful family of tools are SAT solvers, which treat a logic problem as a gigantic puzzle of true/false assignments. They ask: is there any way to make this giant formula true? The famous DPLL algorithm (Davis–Putnam–Logemann–Loveland, 1960s) systematically guesses truth values and backtracks when it hits a conflict. Modern SAT solvers use conflict‑driven learning to remember dead ends and avoid them. These techniques can handle problems with tens of thousands of variables — something unimaginable by brute force. In 2016, a SAT‑based supercomputer proof settled the Boolean Pythagorean Triples Problem, a long‑standing question in Ramsey theory, producing a 200‑terabyte proof that no human could ever check by hand.

Real‑World Superpowers: From Chips to God

When a famous chip had a math bug, formal verification became a lifesaver for industry.

Automated reasoning isn’t just for abstract math. In 1994, Intel’s Pentium processor shipped with a tiny error in its floating‑point division table, costing the company nearly $500 million. Today, companies use theorem provers like ACL2 and HOL to formally verify that chip designs meet their specifications, proving correctness mathematically instead of just testing. Similarly, software controlling nuclear reactors or airplane flight systems must be verified to avoid disasters — automated provers check that safety properties hold under every possible input.

Even philosophy has felt the impact. Kurt Gödel (1906–1978) famously outlined an ontological argument for the existence of God using higher‑order modal logic. In 2014, researchers automated the proof with the help of provers like LEO‑II and Satallax, discovering that Gödel’s original assumptions were actually inconsistent and needed a small fix. The computer‑assisted proof showed that a weaker logic system was enough to establish God’s necessity, a result that both vindicated and corrected the philosophical argument. This computational metaphysics echoes Leibniz’s dream of a “calculus of reason” where disputes could be settled by calculation.

The math world has benefitted too. Thomas Hales’ proof of the Kepler Conjecture (the densest way to pack spheres) spanned 300 pages and relied on computer calculations. To be absolutely sure, Hales led the Flyspeck project. Using HOL Light and Isabelle, the proof was formally verified. Automated reasoning assistants now help mathematicians fill gaps, check proofs, and even discover new ones. Many researchers hope that one day provers will reason like humans do, not just as massive search engines.

Why You Should Care

You might one day rely on a computer partner to check your own reasoning, like a spell‑checker for logic.

Automated reasoning may sound like an abstract branch of computer science, but it touches your life already. Every time you use a credit card online, protocols verified by these methods help keep your information secure. The microcontroller in your game console has been designed with formal verification. And as artificial intelligence grows, making sure an AI’s reasoning is logically sound will become critical. You might even use a mini‑prover as a personal “logical spell‑checker” someday, catching hidden contradictions in your arguments or essays.

The story of EQP isn’t just about a machine beating humans at a puzzle. It’s about teaching a computer to follow the same rules of deduction that you use when you solve a mystery or convince a friend. The discipline of laying out assumptions and proving a conclusion, with no room for error, is at the heart of philosophy. Now machines can be your partners in that quest.

Think about it

If a computer can prove a math theorem, does that mean the computer “understands” math in the same way you do? Why or why not?
Suppose a SAT solver finds a proof that no human can check because it’s too long. Should we still trust the result? What would it take for you to be confident?
If you had a personal reasoning assistant that could point out logical mistakes in your arguments, would you use it? How might that change the way you discuss things with friends or family?

Email

The Computer That Cracked a 50-Year Math Puzzle

A 50-Year Puzzle That Stumped Everyone

What Does It Mean to Prove Something?

Speaking the Computer’s Language: Clauses and Resolution

Smart Shortcuts to Beat the Explosion

Real‑World Superpowers: From Chips to God

Why You Should Care

Think about it

Can a Math Proof Also Be a Computer Program?

Are There Math Problems No Computer Can Ever Solve?

Why Can't Computers Solve Every Problem?

A 50-Year Puzzle That Stumped Everyone

What Does It Mean to Prove Something?

Speaking the Computer’s Language: Clauses and Resolution

Smart Shortcuts to Beat the Explosion

Real‑World Superpowers: From Chips to God

Why You Should Care

Think about it

Keep exploring

Can a Math Proof Also Be a Computer Program?

Are There Math Problems No Computer Can Ever Solve?

Why Can't Computers Solve Every Problem?