Jun 13 2011

Global Warming and Statistical Artifacts

The 1936 Literary Digest poll was a telephone survey attempting to predict the outcome of the 1936 presidential race between Roosevelt and Alf Landon. The poll is infamous for predicting a huge victory for Landon, when in fact Roosevelt won by a landslide. Conventional wisdom is that the phone survey (a relatively new technology) was biased toward the affluent, who disproportionately supported Landon – therefore it was a problem with the representativeness of the sample. However, later analysis shows that the low response rate was also a contributing factor.

This episode is now the textbook example of the broader concept that data may contain spurious patterns or results, depending on the methods used to gather that data. Humans are great at detecting patterns, and researchers will often mine large pools of data looking for connections. We also do this automatically in our everyday lives – mining the massive amounts of data of our daily experiences for patterns and then often responding as if these patterns are real and meaningful.

There are many kinds of false patterns in data other than sampling bias, and it often takes an expert to know how to interpret a complex data set. Meanwhile complex data can be presented to the public in a partial or deception way in order to create a false impression. The global warming controversy is now the poster child for this phenomenon. The notion that the planet is slowly warming and that human activity is playing a significant role is based upon large sets of data that has to be analyzed in very complex and subtle statistical ways. Both sides of the controversy point to biases or errors in the data that falsely make it look as if the Earth is or is not warming.

I am not suggesting equivalency here – just that the fight is largely taking place in the arena of horrifically complex sets of massive amounts of data. For the record, I find the argument for anthropogenic global warming to be compelling. I would not say that it is certain, but it is probable enough that it is reasonable to think about how we can mitigate such effects from continuing unrestrained into the future. This is one of those areas of research where scientific certainty will likely not be achieved until long after it is too late to do anything about it, so we have to act based upon probability.

One of the many challenges of looking at the data of planetary temperatures is that we need to look at trends over a long period of time. By definition, this takes a long time. (It is similar to asking what the long term effects are of some medical intervention – if you want to know what the risks vs benefits are over 20 years, that will take at least 20 years to research.) What this means practically is that recent trends are difficult to analyze statistically. By definition recent trends are short term.

This has led to the fact that, looking at warming trends since 1995, there has been no statistically significant warming. Global warming dissidents have used this fact to argue that global warming is not happening – whatever warming was happening in the latter half of the 20th century is now over, and this is all part of the natural cycle of temperature fluctuation.

But as I stated – it is always going to be true that when we look at the trend in the last 10 years we have only 10 years of data, and that may not be enough to be statistically significant. So dissidents will always be able to argue that there has been no warming in the most recent decade.

Professor Phil Jones (yes, the same Jones who was caught up in the “ClimateGate” scandal – which, btw, never turned up any evidence of scientific misconduct), was often quoted as saying that the data from 1995-2009 did not show significant warming. It did show warming, which was statistically significant at the 90% confidence level – but not the 95% that is the accepted cutoff. Well, after adding in the data for 2010, the warming trend for this period is now, according to Jones, significant at the 95% level. He is quoted by the BBC:

“Basically what’s changed is one more year [of data]. That period 1995-2009 was just 15 years – and because of the uncertainty in estimating trends over short periods, an extra year has made that trend significant at the 95% level which is the traditional threshold that statisticians have used for many years.

Jones argues that 20-30 years is the time period we really should be looking at. But of course, as I stated, this means we will always be 20-30 years behind the times in our knowledge of recent climate change.


There are many sources of potential artifact in the climate data. Where are the temperature stations located? Have cities built up near them over the years, leading to false warming? There are also artifacts in the time it takes for stations to report their data to central repositories, which then have to crunch the data. There are changing methods of temperature measurement of the years.

In addition to artifact in the gathering and reporting of the data, there are numerous trends in the data itself. There are multiple natural climate cycles, as well as short term anomalies (like volcanic eruptions) that need to be taken into account.

This is why sorting through all of this noise in the climate data is not for the amateur. Of course, now that climate change is a politically-charged issue, the internet if full of exactly that – amateur analysis of the data. This is definitely an area where substituting one’s own analysis for the consensus of scientific opinion is probably not a good idea.

150 responses so far