Evidence in Medicine: Correlation and Causation

Nov 18 2009

Evidence in Medicine: Correlation and Causation

Published by Steven Novella under Science and Medicine
Comments: 6

The following was cross-posted at ScienceBasedMedicine.

There are two general approaches to subverting science-based medicine (SBM): anti-science and pseudoscience. Anti-scientific approaches are any that seek to undermine science as the determinant of the standard of care, often overtly advocating for spiritual or subjectively-based standards. Some attack the validity of science itself, usually with post-modernist philosophy.

Pseudoscientific proponents, on the other hand, praise science, they just do it wrong. In reality there is a continuum along a spectrum from complete pseudoscience to pristine science, and no clear demarcation in the middle. Individual studies vary along this spectrum as well – there are different kinds of evidence, each with its own strengths and weaknesses, and there are no perfect studies. Further, when evaluating any question in medicine, the literature (the totality of all those individual studies) rarely points uniformly to a single answer.

These multiple overlapping continua of scientific quality create the potential to make just about any claim seem scientific simply by how the evidence is interpreted. Also, even a modest bias can lead to emphasizing certain pieces of evidence over others, leading to conclusions which seem scientific but are unreliable. Also, proponents can easily begin with a desired conclusion, and then back fill the evidence to suit their needs (rather than allowing the evidence to lead them to a conclusion).

For example, the anti-vaccine movement systematically endorses any piece of evidence that seems to support the conclusion that there is some correlation between vaccines and neurological injury. Meanwhile, they find ways to dismiss any evidence which fails to show such a connection. They, of course, accuse the scientific community of doing the same thing, and each side cites biases and conflicts in the other to explain the discrepancy. It is no wonder the public is confused.

How, then, do we use the evidence to arrive at reliable scientific conclusions? That is what I will be discussing in this series of posts, beginning with a discussion of correlation and causation, but here is a quick overview: SBM is achieved through a consideration of scientific plausibility and a systematic review of the clinical evidence. In other words – all scientific evidence is considered in a fair and thorough manner, including basic science and clinical evidence, and placed in the context of what we know about how the world works. Further, we do not rely upon any individual’s systematic review of the evidence, but on the consensus of analysis among experts and institutions – so that any biases are likely to average out.

This leads us to the final continuum – the consensus of expert opinion based upon systematic reviews can either result in a solid and confident unanimous opinion, a reliable opinion with serious minority objections, a genuine controversy with no objective resolution, or simply the conclusion that we currently lack sufficient evidence and do not know the answer. It can also lead, of course, to a solid consensus of expert opinion combined with a fake controversy manufactured by a group driven by ideology or greed and not science. The tobacco industry’s campaign of doubt against the conclusion that smoking is a risk factor for lung cancer is one example. The anti-vaccine movement’s fear-mongering about vaccines and autism is another.

Correlation and Causation

Much of scientific evidence is based upon a correlation of variables – they tend to occur together. Scientists are careful to point out that correlation does not necessarily mean causation. The assumption that A causes B simply because A correlates with B is a logical fallacy – it is not a legitimate form of argument. However, sometimes people commit the opposite fallacy – dismissing correlation entirely, as if it does not imply causation. This would dismiss a large swath of important scientific evidence.

For example, the tobacco industry abused this fallacy to argue that simply because smoking correlates with lung cancer that does not mean that smoking causes lung cancer. The simple correlation is not enough to arrive at a conclusion of causation, but multiple correlations all triangulating on the conclusion that smoking causes lung cancer, combined with biological plausibility, does.

Correlation must always be put into perspective. There are two basic kinds of clinical scientific studies that may provide evidence of correlation – observational and experimental. Experimental studies are ones in which some intervention is given to a study population. In experimental studies it is possible to control for many variables, and even reasonably isolate the variable of interest, and so correlation in a well-designed experimental study is very powerful, and we generally can assume cause and effect. If active treatment vs placebo correlates with a better outcome, then we interpret that as the treatment causing the improved outcome. (“Well-designed” is the key here – a subject of a future post.)

In observational studies populations are observed in the real world, but no intervention is being given. Observational studies can be very powerful, because they can look as extremely large numbers of subjects (more than is practical in an experimental study) but the weakness is that all variables cannot be controlled for. Researchers can account for known variables (race, age, and sex are common), but it is always the unknown variables that can confound such studies.

In observational studies lack of correlation is easier to interpret than a positive correlation – if there is no correlation between A and B then we can pretty much rule out a causal relationship. The only caveat is that a correlation can be obscured by a factor that was not accounted for. When a correlation is found in observational studies – that is when the assumption of cause and effect must be avoided, and more thorough analysis is required.

If A correlates with B, then A may cause B, B may cause A, A and B may be caused by a common variable C, or the correlation may be a statistical fluke and not “real”. Further studies are then required to confirm the correlation and any specific causal hypothesis.

To use the smoking example again – the hypothesis that smoking causes cancer as the causal relationship to explain the correlation raises several predictions, all later confirmed. The duration of smoking increases risk of cancer (a dose response relationship), stopping smoking reduces the risk of cancer, greater intensity of smoking increases risk, and smoking unfiltered vs filtered cigarettes is associated with higher risk. These various correlations only make sense if smoking causes lung cancer. Further, tobacco smoke contains substances demonstrated to be carcinogens – so there is biological plausibility.

The greatest abuse of the correlation equals causation fallacy is the assumption of cause and effect from a single anecdotal case. Here we are not talking about an observational study where statistics are brought to bear on hundreds or thousands of subjects, but the uncontrolled observation of a single individual. Such cases are very compelling to the human psyche – we are more moved by stories than statistics. But they make for very weak scientific evidence. This is not to say they are worthless – even a single case can raise the question of a possible correlation. But they cannot be used to establish even that a correlation is real. (Anecdotes generate questions, not answers.)

Again to use the anti-vaccine movement as an example, it is easy to generate fear based upon individual cases of bad outcomes after receiving the vaccine. We are hard-wired to find such events compelling. But such correlations should and do occur on a regular basis, even without any causal factor. Further, it is natural, after a new disease or disorder appears, to think back over any recent events that may explain it. Our minds will latch onto anything that sticks out, and over time our memories will even morph to make the apparent correlation more compelling.

Here is an anti-vaccine and conspiracy site that is essentially collecting anecdotes of correlation between the flu vaccine and miscarriages among women. This goes beyond the assumption of cause and effect from correlation, to the assumption of correlation from anecdote. Common things occur together commonly. Given the number of spontaneous miscarriages, and the number of pregnant women receiving the flu vaccine, we would expect there to be thousands of women who miscarry within 24 hours of receiving the flu vaccine, just by chance alone. So first we have to ask – is this a real correlation?

The answer, according to systematic reviews of existing evidence, is no. There is no apparent risk of adverse outcome in pregnancy from the flu vaccine.

The media tends to develop a narrative they think will sell, and then that becomes the story they are telling. In the midst of this severe flu season, the narrative the media is telling is of dramatic adverse events following the flu vaccine. These events are always there, because people are always getting sick, and when you vaccinate millions of people, some of them will get sick afterwards by chance alone. Individual stories are therefore misleading – we need statistics on large number of people to arrive at any conclusions. But statistics don’t make headlines – individual stories do.

The CDC and the WHO are tracking the statistics as the seasonal and H1N1 flu vaccines are rolled out. In the end, these statistics will tell the story.

In conclusion, correlation is an extremely valuable type of scientific evidence in medicine. But first correlations must be confirmed as real, and then every possible causational relationship must be systematically explored. In the end correlation can be used as powerful evidence for a cause and effect relationship between a treatment and benefit, or a risk factor and a disease. But it is also one of the most abused types of evidence, because it is easy and even tempting to come to premature conclusions based upon the preliminary appearance of a correlation.

6 responses so far