Category Archives: Machine Learning

The Idea of Subjective Probability

I’ve been deep in Bayesian analysis recently, and I want to discuss some of the philosophical foundations.

The background here is that there are roughly two camps of statistical thought: the Frequentists and the Bayesians.  They represent very different ways of thinking about the world.  I fall squarely on the Bayesian side. The purpose of this post isn’t to construct some grand argument. I just want to introduce a simple idea: Subjective Probability.

Just like the world of statistics is divided between the Frequentists and the Bayesians, the interpretation of probability is divided into objective and subjective. Objective probability is associated with the frequentists and subjective with the Bayesians.

The prime example of objective probability is a coin flip. Suppose that this is a fair coin and it produces heads on half of all flips. It is an objective property of the coin that it produces heads one out of every 2 times.

Let’s look at another example: a deck of cards. A standard deck is weighted to produce a heart a quarter of the time, and to produce a picture card 3/13ths of the time. Again, it’s helpful to think of the deck as yielding an objective probability – but this way of thinking is limiting.

For example, suppose you have a deck of cards on the table and you again want to assign a probability of seeing a picture card. You know that it’s 3/13, but you keep staring at the top card in that deck. You see the back of that card. You know it’s either a picture or it’s not. “What are you?” you say. As soon as you turn over that card, the probability either goes to 0 (it’s not a picture) or it goes to 1 (it is a picture).

What if you shuffled the deck and you happened to get a peak at the card on the bottom? You’d then change your expectations of what the top card is going to be. What if you caught a glimpse of that card, but you’re not exactly sure?

The probability now isn’t some inherent property of the deck, it’s a number in your mind that represents your expectations of the top card being a picture card. This number can take into account the inherent properties of the deck of course, but it can also take into account any other information you have as well as your experience.  For example, maybe you suspect the deck is rigged. You’re belief about the deck might be different from someone else’s.

Subject probability applies much better in real-world forecasting situations. Let’s say you want to assign a probability to a particular candidate winning an election. In the end, they’ll either win or they’ll lose – but the probability you assign is an expectation of that event. You don’t need to be well informed to have a subjective expectation – but you want to set yourself up to have more accurate expectations as you gather more information.

Sometimes we assign binary expectations to an event. For example, if I am absolutely sure something will occur I will assign it a 1. If I believe it is impossible, I’ll assign it a 0. And then I make decisions based on that belief.

But it turns out that we can make better decisions by hedging. If I see on my phone that there’s a 30% chance of rain, maybe I won’t bring my umbrella but I’ll wear clothes that I don’t mind getting wet.

What does it mean to have a degree-of-belief of 30% rain? It’s not like we’re living in a frequentist world where that particular day can be repeated over and over again to get a fraction. This is a difficult concept to define, but another way to think about it is a ratio of expectations. If there’s a 30% chance of rain, that means that there’s a 70% chance of no-rain, and the ratio of expectations is 3:7. It’s related to the amount of risk we’re willing to take on a certain outcome.

When the event finally occurs, we can quantify how surprising that event was by using logarithms on the assigned probability of that event. For the example above, if it rains the surprise is -ln(0.3), or roughly 1.2. If it doesn’t rain, it’s -ln(0.7) or roughly 0.35.

Just because you’re very surprised doesn’t mean you were wrong to assign the probabilities that you did. It could be that your forecasting was really good given the information at hand, and a rare event occurred. But it’s generally true that if you are surprised less often after adjusting your methods for assigning probabilities, your new methods are probably better. In complex systems, there’s no optimal method – you can always add more data and computation. In simple games, there’s usually an optimal – and these can be thought of as objective probabilities.

Anyone can assign a subject probability to an event. You’ll often hear in casual conversations remarks like “there’s a 20% chance we’ll be on time”. These probabilistic assignments are often made before any thought has been put into then. If you want to assign better probabilities, a good start is to follow some basic logic. For example, if X always leads to Y, the probability of Y must be greater than or equal to X. There’s also the indifference principle: if you have no information distinguishing two mutually exclusive events, then you should assign them equal probabilities.

And finally, there’s Bayes rule. This tell us how to update our beliefs when we are exposed to new information. This most important rule is how the idea of subjective probability gives rise to Bayesian inference.