Sep 18 2007

Are Most Medical Studies Wrong?

John Ioannidis has published a series of studies that demonstrate that most published medical studies turn out to be wrong, and when they are correct the effect size tends to be initially exaggerated. Ironically, while this work should serve to improve the quality of scientific medicine, it is being used by some cranks to attack the scientific basis of medicine.

In his now classic study Ioannidis systematically reviewed highly cited medical studies published in prominent journals, and then looked at the next 20 years of published studies to see if the initial results were later verified or refuted. What he found is that the majority were later refuted.

As an academic physician (and a skeptic) this is not a shock. What is disturbing is that some have been misrepresenting the implications of this research. Writing in the Wall Street Journal, Robert Lee Hotz concluded that the core problem was sloppy methods by scientists. Even worse, the HIV denial community has hit upon this research as a way to dismiss the findings of science.

It is certainly true that part of the cause of this effect that Ioannidis is describing is due to systemic problems in biomedical research. Many studies do indeed have poor design, or are too small to be reliable. Both researchers and publishers have a bias toward publishing positive studies, while negative studies tend to languish unnoticed (the so-called file-drawer effect). There is tremendous pressure to publish, motivating some researchers to massage the data, or to publish preliminary results. There is also the occasional outright fraud.

These factors require constant vigilance and also careful thought as to how to police the system to minimize their effects. But they do not appear to be the primary reason for the effect that Ioannidis is describing.

Alex Tabbarok wrote this superb summary of Ioannidis’ research and put it into a much more meaningful context. He points out that statistics alone would cause many positive research results to be false positives. This results from the fact that most new hypotheses are going to be wrong combined with the fact that 5% of studies are going to be positive (reject the null hypothesis) by chance alone (assuming a typical p-value of 0.05 as the cutoff for statistical significance). If 80% of new hypotheses are wrong, then 25% of published studies should be false positives – even if the research itself is perfect.

But there are other factors at work as well. Tabbarok points out that the more we can rule out false hypotheses by considering prior probability the more we can limit false positive studies. In medicine, this is difficult. The human machine is complex and it is very difficult to determine on theoretical grounds alone what the net clinical effect is likely to be of any intervention. This leads to the need to test a very high percentage of false hypotheses.

What struck be about Tabbarok’s analysis (which he did not point out directly himself) is that removing the consideration of prior probability will make the problem of false positive studies much worse. This is exactly what so-called complementary and alternative medicine (CAM) tries to do. Often the prior probability of CAM modalities – like homeopathy or therapeutic touch – is essentially zero.

If we extend Tabbarok’s analysis to CAM it becomes obvious that he is describing exactly what we see in the CAM literature – namely a lot of noise with many false-positive results.

Tabbarok also pointed out that the more different researchers there are studying a particular question the more likely it is that someone will find positive results – which can then be cherry picked by supporters. This too is an excellent description of the CAM world.

The implications of Ioannidis’ research, therefore, is not to undermine or abandon scientific medicine, but rather to demonstrate the importance of re-introducing prior probability in our evaluation of the medical literature and in deciding what to research. As much as I am in favor of the Evidence-Based Medicine (EBM) movement, it does not consider prior probability. I have said before that this is a grave mistake, and the work of Ioannidis provides statistical support for this. One of the best ways to minimize false positives is to carefully consider the plausibility of the intervention being studied. CAM proponents are deathly afraid of such consideration for they live in the world of infinitesimal probability.

Considering scientific plausibility would also kill, in a single stroke, the National Center for Complementary and Alternative Medicine (NCCAM) – which is ostensibly dedicated to researching medical treatments that have little or no scientific plausibility.

Tabbarok wrote a list of the rules we should follow in considering the medical literature in light of Ioannidis’ studies. I cannot improve upon them, except in what I have already written above, so I will just copy them here:

1) In evaluating any study try to take into account the amount of background noise. That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.

2) Bigger samples are better. (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).

3) Small effects are to be distrusted.

4) Multiple sources and types of evidence are desirable.

5) Evaluate literatures not individual papers.

6) Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.

7) As an editor or referee, don’t reject papers that fail to reject the null.

All very well said. I will simply extend his recommendations by saying that we must also abandon the double standard that is being feverishly pursued by CAM true believers (often in the bogus guise of “health freedom”). If we apply this same standard to all medical research, the fantasy world of CAM melts away to mere “noise” in the background.

With regard to scientific medicine, I would emphasize what Tabbarok said about evaluating the literature, not a single study. Ioannidis compared single studies to the later literature – using the literature as its own gold standard. This does not mean that medical research is wrong. It just means that a research question has to mature, that multiple studies by independent researchers are required before we arrive at a reliable conclusion.

The implications for the practicing physician are clear – don’t overreact to every study, do not practice “knee-jerk” medicine. Take a cool and skeptical eye at published research, and follow the rules of thumb above. When this is done it is possible to practice science-based medicine that is very reliable.

The implications for society are also clear – a rational health care system must be based upon sound scientific reasoning as well as the best evidence available (what I call science-based medicine, to distinguish it from the laudable but inadequate evidence-based medicine). Also – and this applies to individuals as well as the science media – any single study must be put into the context of the broader literature. Great mischief has been inflicted upon the public by the media touting the resuls of a single study (almost always described as a “breakthrough”) without providing any scientific context.


Orac writes also about this issue.

12 responses so far