Skip to content
Philosophy for Kids

How Much Surprise Hides in a Text? The Quest to Measure Information

A Message That Makes Your Heart Race

Which envelope holds more information? Bulk alone doesn’t tell you the story.

Last Tuesday, Maya checked her phone. A message from her friend Alex: “Hi.” She yawned. A few minutes later, another buzz: “I just got a pet snake.” Maya’s eyes flew open. Same friend, same device — so why did the second message feel so much bigger? Because it was a complete surprise. It carried more information.

We use the word information all the time — for news, texts, videos, whispers. But what is it, exactly? In the 1940s, engineers were racing to squeeze more voice calls and telegraph clicks through crowded wires. They needed a way to measure information the way a ruler measures length, so they could talk about it precisely. That turned out to be a deeply philosophical puzzle — and cracking it changed everything, from your phone to how we think about knowledge itself.

Shannon’s Answer: Count the Bits of Surprise

When you pull the red candy from a sea of green, you get a three‑bit surprise.

Claude Shannon (1916–2001), a mathematician at Bell Labs, solved the first half of the puzzle. He noticed that a message’s information depends on how likely it was. If the sky is cloudy, the message “It’s still cloudy” tells you almost nothing — you already knew it. But if you hear “A kangaroo just hopped onto your driveway,” your brain lights up, because the chance of that was tiny.

Shannon used probability to put a number on surprise. He defined the information in a single message as the negative logarithm of its probability: I(A) = –log₂ P(A). Don’t let the math scare you — the idea is beautiful. A fair coin toss has two equally likely outcomes; the chance of “heads” is 1/2. Shannon’s formula gives –log₂(1/2) = 1 bit. One bit is the amount of information that answers a perfectly balanced yes‑or‑no question. If you pull a red candy from a bag with seven green and one red, the chance is 1/8, so the message “You got the red candy!” contains 3 bits. The rarer the event, the more bits you get.

Shannon also captured the additive nature of information. If you toss a coin and then roll a fair four‑sided die, the total information is 1 bit from the coin plus 2 bits from the die — 3 bits altogether. The logarithm is the only mathematical function that makes that additivity work while keeping bigger numbers for bigger surprises. This measure is called Shannon entropy when you average it over a whole source of possible messages. Shannon’s theory gave us the bit as a universal unit, but it deliberately ignored what the messages mean. The sentence “I ate toast” and a random string of letters might carry the same number of bits under Shannon’s formula — the content doesn’t count.

Kolmogorov’s Twist: The Shortest Computer Program

The tidy pattern on the left can be described in a few words; the messy one can’t — it contains more information.

If you type the string “ABCABCABC” on your screen, you can describe it in a few words: “repeat ABC three times.” But a random jumble like “QZFXBROTW” can’t be compressed — you have to spell it out character by character. Which one holds more information? Shannon’s approach would treat both as equally probable strings of the same length and miss the fact that one has a tidy pattern.

Andrey Kolmogorov (1903–1987) and other mathematicians of the 1960s offered a new measure: algorithmic complexity, often called Kolmogorov complexity. It defines the information in a string as the length of the shortest computer program that produces that string when it runs on a universal Turing machine — an imaginary, all‑purpose computing device that can follow any set of instructions. For “ABCABCABC,” a tiny program does the job, so its Kolmogorov complexity is low. The random jumble needs a program that essentially lists every letter, so its complexity is high.

This measure captures something deep: information is about structure, not just surprise. It explains why a pattern‑filled string feels less informative than a chaotic one — the pattern can be compressed. But there is a catch: Kolmogorov complexity is uncomputable. There is no algorithm that can look at any string and find the absolute shortest program for it, because some programs might run forever and you can never be sure they won’t eventually produce the string. Still, many mathematicians see it as more fundamental than Shannon entropy, because it describes the information inside a single object, not just its probability in a crowd.

The Meaning Problem: Does Truth Count?

True, meaningful information clicks together; falsehood stays in pieces.

So far, we have two ways to measure information that don’t care whether a statement is true. “The moon is made of green cheese” is an improbable sentence — under Shannon it would carry a heap of bits. Yet nobody would call it information worth knowing. This is where a third idea steps in: semantic information.

Philosophers like Luciano Floridi (born 1964) argue that real information must be well‑formed (it follows the rules of language), meaningful (the words point to something), and truthful. If I tell you “Paris is the capital of France,” that counts as semantic information. If I say “Paris is the capital of Belgium,” it is false — so, according to this view, it is not information at all, just misinformation. This fits our everyday intuition: when you learn something true about the world, you’ve gained information; when you’re fooled, you haven’t.

Shannon’s bit‑counting was designed for communication channels, not for verifying truth. Kolmogorov’s complexity looks at pattern, not reality. The philosophical debate is alive: can one definition cover both the measurable stuff that flows through wires and the meaningful truths we build knowledge from? Some philosophers think we need two separate categories; others are hunting for a single framework that ties them together.

Why You Should Care: The Invisible Scaffolding

Every flicker in this room runs on the same bit‑counting logic that Shannon and Kolmogorov explored.

This isn’t just a textbook argument. When you compress a photo into a ZIP file, your phone uses pattern‑hunting ideas that echo Kolmogorov complexity. When your video call stays smooth even when the signal is weak, error‑correcting codes lean on Shannon entropy. The locks that protect your passwords rely on the fact that a truly random key has sky‑high algorithmic complexity, making it impossibly hard to guess.

And the questions reach far beyond gadgets. Physicists now wonder whether the universe itself is a gigantic information processor — black holes might store information on their surfaces, and quantum computers are rewriting the rules of what a bit even is. Meanwhile, philosophers ask: could a machine ever understand meaning, or will it always only count bits? If a lie carries just as many bits as the truth, how should we design social media platforms that value truth?

The hunt for a perfect measure of information is still open. Shannon’s bits, Kolmogorov’s programs, and Floridi’s truthful data each capture a piece of the puzzle. Piecing them together is one of the great adventures of twenty‑first‑century thought — and it’s happening inside the very device you’re holding right now.

Think about it

  1. Every morning your friend texts “Good morning.” That message is completely predictable — Shannon says it contains zero bits of information. Does that mean it’s worthless? Could it carry something that bits can’t measure?
  2. A lie can be just as surprising as the truth. Should a philosophy of information treat a false but shocking statement as information? What would be lost if we said yes?
  3. You have a compressed ZIP file that, when opened, reveals a perfectly random‑looking secret code. Is the ZIP file richer in information, or is the code? What might Shannon say, and what might Kolmogorov say?