Skip to content
Philosophy for Kids

What Does Your Code Really Mean? The Hidden Puzzle of Programs

The Man With the Twin Programs

Robin Milner realized that programs that act the same might still be different inside.

In 1975, Robin Milner (1934–2010) sat in his office with a puzzle. He had two scraps of code. They looked different. But no matter what numbers he typed in, the computer printed exactly the same output. Were those two programs truly the same, or just good at pretending? This was not a trick question for programmers. It was a deep problem about meaning itself. What does a computer program really mean? And can mathematics capture it perfectly? Milner’s question launched a search that blended logic, mathematics, and a new idea: that computing is like a conversation.

Two Ways a Program Can Mean Something

Denotational meaning is like the perfect recipe; operational meaning is the real, sometimes messy process.

When you learn to code, you usually think of a program as instructions: do this, then that. But philosophers of computer science found two very different ways to talk about what a program means.

The first way is called denotational semantics. This approach, pioneered by Christopher Strachey (1916–1975) and Dana Scott (born 1932), gives each program a mathematical object—a function that maps inputs to outputs. It is like a recipe card: “If you give me a number, I will give you twice that number.” The meaning is the abstract rule, not the messy steps the computer takes.

The second way is operational semantics. This meaning comes from actually running the program and watching what happens. It is the real, step-by-step process: fetch a value, add one, store it back. If the program never stops, its operational meaning is simply “diverges”—no output, no value. It is the difference between reading a cake recipe (denotational) and actually baking the cake (operational). You might follow the recipe, but the real cake could flop.

Philosophers wanted more than two separate stories. They wanted a single account where the mathematical meaning and the behavioral meaning always line up perfectly. That dream is called full abstraction.

The Dream: When Meaning and Behavior Match

Full abstraction aims for a perfect fit—the mathematical meaning and the real-world behavior are indistinguishable.

To get meaning and behavior to match, you need a rule called compositionality. It says the meaning of a whole program comes from the meanings of its parts, put together in a predictable way. Think of building a sentence: the meaning of “The cat sleeps” comes from the meanings of “the cat” and “sleeps.” Programs work the same way. A bigger program is built from smaller phrases, and its meaning should be built from theirs.

There is also a test from the world of behavior. Two phrases are observationally equivalent if no program context can ever tell them apart. Imagine two puzzle pieces that fit into the exact same slot, no matter what board you try. If you can never make them give different results, they are observationally the same.

Full abstraction happens when denotational equivalence (same mathematical object) exactly equals observational equivalence (cannot be told apart). If that holds, your mathematical model of the language is perfectly faithful. It captures everything that matters for behavior—nothing more, nothing less.

For many simple languages, full abstraction works. But in the 1970s, researchers found a tiny language where it failed spectacularly. That language was called PCF, and its ghost would haunt them for years.

The Ghost in the Machine: Parallel-Or

Parallel-or produces 'true' the moment either input is true, without waiting for the other—something no sequential program can do.

PCF (Programming with Computable Functions) was a toy language. It had numbers, booleans (true/false), and the ability to create functions that take other functions as input. It also allowed a program to loop forever—a crucial feature of real computing.

The standard mathematical model for PCF used domains, sets of partially ordered pieces of information. A domain is like a ladder of facts: at the bottom is “no information” (a program that has not produced an answer). Above that are actual results. The model allowed functions that work with this ladder.

Then Gordon Plotkin (born 1946) made a startling discovery. There is a function called parallel-or that exists in the domain model but cannot be written in PCF. Parallel-or works like this: it takes two boolean arguments. It returns true if either argument is true, even if the other argument is still computing (stuck in an infinite loop). In PCF, a function must evaluate its arguments one at a time, left to right. If the left argument loops forever, the whole program hangs. Parallel-or would need to check both arguments at once—like having two processors running in parallel. Because PCF is strictly sequential, no PCF program can implement parallel-or.

That meant the domain model had “extra” meanings—functions that matched no real program. Denotational equivalence did not match observational equivalence. Full abstraction was broken. The supposed perfect match was a mirage.

Programs as Conversations: Game Semantics

In game semantics, a program is like a player in a dialogue, answering only in a well-bracketed, innocent way.

The failure sent researchers hunting for a better model. How could they capture exactly what sequential programs can do, and nothing more? The answer came from thinking of computation as a game.

In game semantics, a program is not a static object. It is a player in a two-person dialogue. The environment (the Opponent) asks a question. The program (the Proponent) answers. Often the program needs more information, so it asks its own question, and the environment answers. They take turns, like a conversation. A program’s meaning is its strategy: a rule that tells it how to respond based on the history so far.

To capture PCF exactly, strategies must be innocent and well-bracketed. “Innocent” means the program’s response depends only on the part of the conversation that is logically relevant—it cannot peek at extra details. “Well-bracketed” means questions and answers are nested like properly paired parentheses; you answer the most recent pending question before moving to an older one. These restrictions block parallel-or and other non-sequential tricks.

When researchers applied these game rules to PCF, every allowed strategy corresponded to a real PCF program, and every PCF program had a strategy. The match was perfect. Intensional full abstraction—where every piece of the model is defined by a program—had been reached. The ghost was caught.

Why It Still Matters

You might not write programs in PCF. But the puzzle of full abstraction is still alive every time you use code. When you call a function someone else wrote, you trust that you can swap it with another function that behaves identically and your program will still work. That trust depends on a notion of observational equivalence. Full abstraction guarantees that mathematical reasoning about programs will not betray you—if two pieces of code are equal in the model, they are truly equal in every possible situation.

The story also teaches a bigger lesson about meaning. Sometimes the best way to understand something is not to examine its inner parts, but to see how it interacts with the world. A program is not just a frozen block of math; it is an ongoing conversation. That idea, born from a frustrating mismatch in a toy language, now shapes how computer scientists design reliable systems and how philosophers think about communication itself. Next time you debug a stubborn program, remember: you are not just fixing steps—you are tuning a dialogue.

Think about it

  1. If two programs always produce the same output but one runs much faster, are they really the same? What does “same” mean here?
  2. Could there be a program whose behavior is correct but whose inner meaning is completely different from what the programmer intended? How would we ever know?
  3. If an alien civilization designed computers that always computed in parallel, would their idea of “program equality” be different from ours? Why?