How Do We Know an Experiment Isn’t Fooling Us?

The Morning You Built a Volcano (and the Problem of Trust)

Even a kitchen experiment sparks the big question — how do you know it wasn’t just luck?

You mix baking soda and vinegar, step back, and watch the foam spill over your cardboard mountain. It worked! But how do you really know the eruption wasn’t just a fluke? You might repeat the experiment three more times. You might use warm vinegar instead of cold to see if the foam rises faster. Without even thinking about it, you’re doing what scientists have done for centuries: you’re looking for ways to check your result.

The question of when to trust an experiment has been a knotty problem ever since natural philosophers first started poking and prodding nature. In the 1600s, Francis Bacon (1561–1626) believed a single “crucial experiment” could settle a debate between two rival theories all by itself. Thomas Hobbes (1588–1679) pushed back. He argued that human reason guides every step—choosing what to measure, interpreting the data—so we should study reasoning, not just lab technique. Their disagreement echoes down to today. Modern philosophers of science ask: can a cleverly designed experiment force nature to give a straight answer, or are we always seeing what we expect to see?

The Toolbox: How Scientists Check Their Work

Hacking argued that actively intervening — tweaking what you observe — helps you trust the image.

When the philosopher Ian Hacking (1936–2023) asked “Do we see through a microscope?”, he wasn’t doubting our eyesight. He was wondering how we know a complex instrument isn’t creating a fake picture. After all, a microscope is built using plenty of theory about light and lenses. If that theory were wrong, might the image be an artifact—a trick of the machine?

Hacking’s answer was that scientists don’t just stare; they intervene. If you inject a dye into a cell and the shape changes exactly as you predicted, that strengthens your faith in both the microscope and the observation. This is the first tool in a whole toolbox of epistemological strategies — ways of checking that our knowledge is solid.

Other tools include independent confirmation. When the same pattern of dense bodies inside a cell shows up under ten different kinds of microscopes (optical, electron, phase-contrast), the coincidence would be absurd if the pattern were just an artifact. Different instruments have different quirks, so a shared artifact is wildly unlikely. Calibration helps too: before you use a new spectrometer, you point it at something well-known, like the spectral lines of hydrogen. If it reproduces the familiar Balmer series, you’ve bought some confidence.

Sometimes the result itself is so richly structured that it argues for its own correctness. Galileo’s first telescope was crude. Yet he saw not just four dots near Jupiter, but dots that performed eclipses and obeyed Kepler’s Third Law — a miniature solar system. It’s nearly impossible that a flawed lens would invent such an orderly dance. Robert Millikan (1868–1953) made a similar argument when he measured the charge of a single electron: thousands of oil drops never once showed a charge that wasn’t a whole-number multiple of one tiny unit. A sloppy apparatus would not have produced such neatness.

None of these strategies is a guarantee. Science is fallible. But they give scientists good reasons — epistemic reasons — to believe that what they’re seeing is real, not noise.

The Experimenter’s Regress: When Checking Gets Circular

When multiple experiments disagree, how do you decide which one is right?

Yet not everyone is convinced. The sociologist Harry Collins (born 1943) pointed out a trap he calls the experimenters’ regress. To know whether a result is correct, you need a properly functioning apparatus. But how do you know the apparatus is working properly? Because it gives you the correct result. That’s a circle — there’s no outside judge.

Collins studied the bitter dispute over gravity waves in the 1970s. Physicist Joseph Weber claimed his detectors had caught gravitational radiation. Six other teams, using similar equipment, found nothing. According to Collins, the community couldn’t settle the argument with pure logic. Weber’s critics had calibrated their instruments by injecting known acoustic pulses, shared data and analysis programs, and even ran Weber’s own algorithms on their data — and still saw no signal. But Collins insisted that since the phenomenon was brand new, there was no independent standard to say whose apparatus was truly working. The decision, he argued, was ultimately a social negotiation, not a clean verdict from nature.

Other philosophers, like Allan Franklin (1935–2020), strongly disagree. Franklin argued that the critics’ work was simply more trustworthy: they had performed independent confirmation, eliminated plausible sources of error, and calibrated with known signals. Weber, by contrast, could not detect those calibration pulses and had programming errors that generated spurious coincidences. The community, Franklin said, made a reasoned judgment. This tension — between reasoned evidence and social process — remains one of the liveliest debates in the philosophy of science.

The Dance of Agency: Is Science Built or Found?

Discovering a new particle like the Higgs boson requires aligning machine, theory, and data — a delicate dance.

The philosopher Andrew Pickering (born 1948) took the social picture even further. He thinks of science as a dance of agency between human experimenters and the material world. An apparatus rarely works perfectly the first time. Scientists spend weeks or months tinkering — tweaking the setup, revising their understanding of the machine, and rethinking the theory of what they’re trying to observe. When everything finally clicks, Pickering calls it a stabilization — a moment where the apparatus, the theory of the apparatus, and the theory of the phenomenon all come to hold each other up.

Consider physicist Giorgio Morpurgo in his search for quarks with fractional electric charge. At first, his modern Millikan-style apparatus showed a continuous smear of charge values — which matched neither the theory of integer charge nor the theory of fractional charge. So he altered the apparatus, increasing the separation between plates, and eventually got clean integer values. Pickering sees this as a story not of discovery but of forging a stable fit between human choices and material resistance.

Critics worry that this view makes nature seem like a soft clay. Where is the hard fact that fractional charges either exist or they don’t? Franklin and others insist that the very need to reproduce known phenomena — like integer charge — gives the natural world a stubborn say. Different experimenters can achieve different fits, but the scientific community eventually weeds out the ones that don’t survive more testing. Even Pickering acknowledges that “the outcomes depend on how the world is.” The question is how much weight that “is” carries.

Big Science and the Look-Elsewhere Effect

In mega-labs, automated triggers and statistical thresholds shape what scientists are even allowed to see.

In today’s giant experiments — like the Large Hadron Collider — the dance becomes even more complicated. Detectors are so vast and collisions so many that automated triggers have to decide, in a millionth of a second, which events are worth recording. What if the trigger, pre-loaded with expectations from current theory, throws out a signal that would have revealed something new? Physicist Wolfgang Panofsky (1919–2007) worried about exactly that: “What is the extent to which we are negating the discovery potential of very-high-energy proton machines by the necessity of rejecting, a priori, the events we cannot afford to record?”

Then there’s the look-elsewhere effect. Suppose you search for a predicted particle in a wide energy range. You spot a juicy signal in one narrow bin. But the probability of finding some signal somewhere in the whole range is much higher than finding it exactly there. How much more should you look before declaring discovery? When experimenters hunted for the Higgs boson, theorists favored a relaxed standard, believing prior theory already pointed where to look. Experimentalists insisted on a tough five-sigma threshold, wanting the data to shout over any random blip. The disagreement reveals that even the most mathematical part of science — statistics — still rests on judgments about how much to trust theory.

Why Trust Experiments at All? (And What It Means for You)

It can feel unsettling. If even the pros argue about when an experiment is convincing, how can you trust that the medicine you take or the weather forecast you read is based on solid ground? This is where the toolbox matters — not because it removes all doubt, but because it builds a web of reasons that is hard to dismiss.

The same epistemological strategies that Hacking found in physics show up in biology too. In the 1950s, H. B. D. Kettlewell wanted to test why darker peppered moths were becoming more common in polluted industrial areas. He used human observers to judge camouflage, released moths into enclosures with birds, and then performed large mark-release-recapture experiments in the wild — checking each step against possible errors. He even filmed birds preying on the moths to nail down that conspicuousness was the real factor. His toolbox was the same as a physicist’s: calibration, independent confirmation, and elimination of alternatives.

So, the next time you build a volcano for a science fair — or watch a news story about a new vaccine — remember: the confidence you feel rests on centuries of learning how to check work, argue openly, and, when necessary, change your mind. Scientific facts are not handed down from a mountaintop; they are built, scrutinized, and sometimes knocked down and rebuilt. That’s exactly what makes them strong.

Think about it

If two honest experimenters both use all the checking strategies and still get opposite results, what should the rest of us believe — and why?
Can you think of a situation in your own life where you were fooled by something that seemed like a clear pattern but was really a coincidence? How did you figure it out?
Should we trust an experiment more if it matches a beautiful theory, or should that make us more suspicious?

Email

How Do We Know an Experiment Isn’t Fooling Us?

The Morning You Built a Volcano (and the Problem of Trust)

The Toolbox: How Scientists Check Their Work

The Experimenter’s Regress: When Checking Gets Circular

The Dance of Agency: Is Science Built or Found?

Big Science and the Look-Elsewhere Effect

Why Trust Experiments at All? (And What It Means for You)

Think about it

If Scientists See Through Ideas, Can Science Still Be Fair?

Why One Experiment Can’t Prove a Theory Wrong

Can You Trust a Scientist Who Trusts Other Scientists?

The Morning You Built a Volcano (and the Problem of Trust)

The Toolbox: How Scientists Check Their Work

The Experimenter’s Regress: When Checking Gets Circular

The Dance of Agency: Is Science Built or Found?

Big Science and the Look-Elsewhere Effect

Why Trust Experiments at All? (And What It Means for You)

Think about it

Keep exploring

If Scientists See Through Ideas, Can Science Still Be Fair?

Why One Experiment Can’t Prove a Theory Wrong

Can You Trust a Scientist Who Trusts Other Scientists?