How Good Are Your Guesses? The Math That Judges Your Beliefs
The Cookie Heist Detective

Imagine you’re a detective investigating a cookie heist. You’ve narrowed it down to three suspects: the cat, your little brother, and your mom. At first, you’re 70% sure it was your brother. But later you find a trail of crumbs leading to the cat’s bed. How should your 70% change? And is 70% even a good number to have?
Philosophers and mathematicians have built a whole toolkit for answering these questions. They treat your beliefs — and especially your credences, the numbers that measure how confident you are — like guesses in a game. They give those guesses a score based on how close they get to the truth, and then use the rules of decision-making to figure out what kinds of guesses you should make and how you should change them when you learn something new. This way of thinking is called epistemic utility theory.
How Do We Score a Guess?

To judge a guess, we need a scoring system. Suppose you’re trying to guess how many marbles are in a jar. The correct number is the truth. Your guess is your credence — how sure you are — squeezed into a single number. If you guess “exactly 50” and the truth is 50, you get full points. If you guess “50” and there are only 20, your score should be lower because you were far off. One simple way to score is to take the square of how far off you were, and then make that number negative — so a bigger miss gives you a much worse score.
The philosopher James Joyce (born 1958) and others use a similar idea to score not just single guesses but whole sets of credences about many claims. The most famous scoring rule is the Brier score, named after a meteorologist. For each claim, it compares your credence (say, 0.7) to the ideal: 1 if the claim is true, 0 if it’s false. The Brier score subtracts the square of the difference from zero. So if you’re 70% sure it was your brother, and he didn’t do it, your score for that claim is −(0.7 − 0)² = −0.49. A perfect guess would be 0, and the worst possible is −1.
Why use squares? Because that particular way of measuring has a special feature: strictly proper scoring rules. A scoring rule is strictly proper if each possible probability expects itself to get the best score. In other words, if you think your credence of 0.7 is the right one, then from your own point of view, no other guess would get a higher expected score. Think of it like a target practice where the bullseye is exactly where you think the truth is. It would be weird if your own aim told you to aim somewhere else.
Why Your Certainties Must Add Up to One

Now we can use scoring rules to argue for a key rule: Probabilism. Probabilism says that the numbers you assign to different claims should behave like probabilities. If you’re 70% sure about your brother, you should be 30% sure he didn’t do it — those two numbers must sum to 1. If you’re 100% sure of something that’s logically certain, give it 1; if it’s impossible, 0. And your credence for “A or B” (when A and B can’t both happen) should be the sum of the two.
Why must you follow these rules? The argument uses a principle from decision theory: Undominated Dominance. Suppose you have two possible strategies, like two sets of credences. If strategy A would get you a better score than strategy B no matter how the world turns out, then choosing B is irrational. That’s like picking a plan that’s guaranteed to lose.
Joyce (1998) and others proved a striking result using the Brier score and similar measures: any set of credences that violates the probability laws is dominated — there is some other set, which follows the laws, that is more accurate at every possible state of the world. Meanwhile, no set of credences that follows the laws is ever dominated. So if you want to avoid a guaranteed loss in the accuracy game, your certainties must add up properly.
How to Change Your Mind When You Learn Something New

So far we’ve talked about your beliefs at one moment. But life is full of new evidence — like a trail of crumbs. How should you update your credences?
The rule with the strongest mathematical backing is Conditionalization, sometimes called Bayes’ Rule. It works like this. Before you see the crumbs, you have a prior credence about each suspect. After you find the crumbs pointing to the cat, you learn for sure that the cat had crumbs leading to its bed. Conditionalization says: your new credence that the cat did it should be your old credence that the cat did it and the crumbs would point that way, divided by your old credence that crumbs would appear at all.
Why is that a good rule? The philosopher Hilary Greaves (born 1978) and the physicist-philosopher David Wallace (born 1970s) showed that if you plan how to update your credences in advance — before you know what the evidence will be — the plan that maximizes your expected epistemic utility is exactly the one that conditionalizes on whatever evidence you receive. In other words, among all possible ways of changing your mind, the ones that expect to get closest to the truth are those that follow Conditionalization. It’s like picking a strategy for adjusting your aim after each new clue, and the math says the best strategy is to multiply the evidence in just that ratio.
Is There Only One Right Way to Believe?

The scoring approach also helps us think about a big question: does one body of evidence force everyone to arrive at the same beliefs? The view that it does is called the Uniqueness Thesis. Its opposite is Epistemic Permissivism, which says that sometimes more than one set of credences can be rational given the same evidence.
The epistemic utility story offers an interesting twist. Suppose we’re not just talking about credences—strength of confidence—but also about outright beliefs, the things you’d say out loud like “I believe my brother took the cookies.” There’s a rule called the Lockean Thesis (inspired by the philosopher John Locke, 1632–1704) that says you should believe a claim if your credence in it is above a threshold, and suspend judgment if it’s below. The exact threshold depends on how you weigh the goodness of believing a truth versus the badness of believing a falsehood.
Now, imagine you and a friend both have exactly the same credence — say, 0.65 that it was your brother. But you think believing a falsehood is twice as bad as failing to believe a truth (so your threshold is 0.67), while your friend thinks they’re equal (threshold 0.5). Then you’ll suspend judgment, and your friend will believe it. Both of you are following the Lockean rule, just with different values. So even if the Uniqueness Thesis is true for credences, permissivism can still pop up in what you outright believe, because reasonable people can weigh the risk of being wrong differently.
For credences themselves, the debate is still fierce. The philosopher Sophie Horowitz (born 1980s) has pointed out that if a scoring rule is strictly proper, each rational credence expects itself to be best, which can make it hard to see why someone would treat an alternative as equally okay. This tension means philosophers are still arguing over whether the math pushes us toward uniqueness or leaves room for genuine, reasonable disagreement.
Why Putting a Score on Beliefs Matters

You might wonder: does any of this math matter outside philosophy classrooms? It does. Every time a doctor weighs how likely a diagnosis is after a new test, every time a jury considers a new piece of evidence, and every time you decide whether to trust a news headline, you’re doing a bit of this probability reasoning — even if you don’t use numbers.
The epistemic utility approach shows that good thinking has a shape: your confidence should be coherent (add up properly), and when new information arrives, you should update by taking into account how it fits with everything else you knew before. It also helps explain why two careful thinkers might still disagree, and why sometimes you should change your mind even when you feel certain.
Next time you’re the detective of a cookie heist — or just trying to figure out which friend is telling the truth — remember that the quality of your guesses can be measured, and the rules for changing them aren’t just gut feelings. They’re part of a deeper mathematical story about what it means to be a good truth-seeker.
Think about it
- If two people both have good evidence but end up with different levels of confidence, should they both change their minds to meet in the middle? Why or why not?
- Should you always adjust your belief when you get a new fact, or are there times you trust your gut and ignore the new evidence?
- If a friend uses a different scoring system to decide what to believe — for example, they care a lot more about never being wrong — can you both be rational even when you disagree?





