Jan 06 2022
Gambler’s Fallacy and the Regression to the Mean
Humans overall suck at logic. We have the capacity for logic, but it is only one of many algorithms running in our brains, and often gets lost in the noise. Further, we have many intuitions, biases, and cognitive flaws that degrade our ability to think logically. Fortunately, however, we also have the ability for metacognition, the ability to think about our own thinking. We can therefore learn logic and how to think more clearly, filtering out the biases and flaws. It is impossible to do this perfectly, so it is best to think of metacognition as a life-long project of incremental self-improvement. Further, our biases can be so powerful, that when we learn how to think about thinking we often just make our logical fallacies more and more subtle, rather than eliminating them entirely.
Some cognitive flaws are evolutionarily baked into our thinking, likely resulting from heuristics that are practical mental shortcuts but not strictly logically valid. There also appears to be some cognitive abilities that were not prioritized in our evolutionary history, and so our finite brain resources were simply not allocated to them. This is where most math and statistically related fallacies derive. We do not deal well with large numbers, and we have terrible intuitions regarding statistics and probability. We have developed elaborate formal systems for dealing with math and probability, essentially to replace or at least augment our intuitive thinking, and often these systems produce results that are counterintuitive.
Perhaps the most famous example of counter-intuitive statistics is the Monty Hall problem. You are given a choice of three doors, behind one is a prize. You can choose one door. The host of this game, who knows where the prize is, then opens one door without a prize (again – they know where the prize is and deliberately choose one of the unchosen doors without a prize), and then ask if you want to change your choice to the other unopened door. If you change your choice your odds of winning go up from 1/3 to 2/3. If you have not encountered this problem before, this may seem counterintuitive, but it is absolutely correct.
Another common statistical fallacy is the gambler’s fallacy. This fallacy derives from an incorrect intuitive feeling that past results somehow magically affect future results, in a system in which each event is supposed to be independent. It’s called the gambler’s fallacy because games of chance are a perfect set up for this error in thinking. Let’s consider a roulette table, where you spin the table and bet on where the tiny ball will land. You can bet on a number, or a group, or on red or black (half the numbers are red, half are black, except for the zeros where the house wins all). Let’s say that a red number has come up the last 10 spins, how does this affect the probability of the next spin coming up red or black? If the game is fair, then the answer is – not at all. Each spin is supposed to be a completely independent random event, like flipping a fair coin. The table is not on a red streak, nor is black “due” because it has not come up in a while.
Yet the illusion of streaks or certain outcomes being “due” is powerful, and such thinking is almost ubiquitous among gamblers. It derives from our tendency toward pattern recognition (apophenia), seeing illusory patterns in random noise. As Carl Sagan said, randomness is clumpy. This is another statistical bias – our intuitive sense of what random looks like is flawed. A random pattern, like stars in the sky, is more uneven and “clumpy” than our intuition. If asked to draw a random pattern, most people will create a pattern that is decidedly not random. The patterns will tend to be too uniform and evenly distributed, for example. So when we see apparent streaks, we don’t see randomness, we see a pattern, and use that to predict future outcomes.
Seeing patterns is useful, when they are real, and so it makes sense that our brains would evolve this capacity. But our brains also have the capacity to determine which patterns are real and which are not, but the balance here does not tend to be optimal. We tend to massively overcall patterns as being real. This may result from evolutionary pressures – the negative consequences of undercalling patterns is likely greater than overcalling them. Also, seeing alleged patterns gives us the feeling of control, and we like that feeling. So we think we can use our amazing powers of pattern recognition to determine that black is “due” and use that power to win big. Casinos love this delusion, because they know that math wins out in the end.
Recently I was asked about the gambler’s fallacy and its relationship to the regression to the mean. I thought this was a good example of subtle statistical logical fallacies can be. They wrote:
“I know that the fact that the roulette wheel has come up red 10 times in a row tells me NOTHING about spin #11. On the other hand, I know that over time, there will be just as many black spins as red spins, so at least intuitively, a black spin seems at least a little more likely to come up next in order to push that ratio back towards 50/50. Are these two principles actually in tension with each other? If not, how do we resolve the apparent tension? If yes, isn’t it the case that there must be “some” validity to the gambler’s ‘fallacy’?”
That’s a great question, and the answer is a definite no – they are not in conflict. Again, the pressure to think that the past influences future independent events is powerful. Regression to the mean is not a power in the universe that ensures that statistics work out in the end, it is purely a probability. Unlikely events are unlikely, whereas likely events are likely. When an unlikely event happens, is it more likely to be followed by another unlikely event or a likely event? Obviously, a likely event, because likely events are always more likely to happen than unlikely events. This may seem obvious when I state it in such general terms, but we don’t always back away from the details to see the situation as raw probability. Regression to the mean simply means that an outlier (unlikely) event is likely to be followed by a more probable average event (because they are inherently more likely). That’s it.
So when a professional athlete has the best year or game of their career, they are very likely to have a more average year or game next time. This doesn’t mean anything, they are not “choking” or cursed or whatever. They are just experiencing probability. When your variable disease is expressing its worst symptoms, you are likely to feel better in the future, because outlier symptoms are likely to be followed by more average symptoms. This is regression to the mean.
The apparent tension of regression to the mean following a run of red numbers is just another gambler’s fallacy illusion. This is produced partly by starting your counting after a statistical fluke, and thinking that, starting from that selected point, black and red need to balance out over time. But they don’t, because you are including data that you already know and is biased in one direction. This fallacy can creep easily into research, which is why as a general rule a study should not include prior data. This is another really subtle statistical bias that creeps into our thinking – failure to consider that the method of our observation may be non-random. This is a major source of confirmation bias.
I can keep going down this rabbit hole of cognitive biases, because there is a lot of interaction among them. But I hope the point is made. For some of you, you may still be stuck on the Monty Hall problem above. The important lesson to be taken away from all this is that we inherently suck as logic and probability, but we can remedy our deficits by thinking carefully and engaging with others who are also thinking carefully. Humans also have a powerful tool – we can write things down. We can accumulate knowledge across centuries, and engage in collective cumulative metacognition. We should avail ourselves of this incredible power.