Jan 23 2017

Acupuncture for Infantile Colic

crying+babyRecently scientists published initial results from an ambitious project to reproduce the results of 50 influential cancer studies. The first five studies resulted in one clear failure to replicate, two partial replications, and two with uninterpretable results.

This is how science works. No one study is definitive, because there are simply too many ways to generate spurious results (even without fraud and with the best intentions). Replication is the final arbiter – any result that is real should consistently reproduce. Results that are spurious will be inconsistent.

These are the core lessons that I have been repeating here and on SBM – most studies are flawed and their results are unreliable. Most false studies are false positive. Even experienced and well-meaning researchers can fall victim to p-hacking and other subtle errors. You can only arrive at a reliable conclusion by looking at a mature and robust research program involving numerous studies and replications. The various replication projects that are under way are confirming this overall impression.

Let’s turn to one of my favorite examples: acupuncture. Acupuncture involved sticking thin needles into specific areas of the body in order to provoke specific clinical benefit, such as pain reduction. There have been several thousand studies of acupuncture. When you review all the research you find, put simply, that acupuncture does not work, for anything. There is no specific effect here, one that is reliably found when appropriately controlled for. The entirety of the research is highly consistent with the conclusion that acupuncture is nothing more than a theatrical placebo.

Why, then, are there frequent headlines saying that a study shows that acupuncture works for this or that? That’s a good question. We are far past the point with acupuncture that doing preliminary studies is of any value. I would argue there is no point to even rigorous studies – three thousand studies is enough already. But if you are going to do a study of acupuncture, if it is less than rigorous it is completely worthless. Even worse, it is likely to be misleading. The only point would be to confirm what you want to believe, and to generate pro-acupuncture headlines.

A recent study of acupuncture for infantile colic (excessive crying) is a perfect example of this unfortunate pattern. This is a low quality study with marginal results. If you look closely, it is essentially negative, but the researchers manage to massage some barely significant results out of the data. The journal, Acupuncture in Medicine, thought these borderline preliminary results were sufficient to justify a press release declaring that acupuncture works for colic, which was dutifully reproduced by the media without skepticism or anything resembling good science journalism.

Fortunately, physicians who know how to interpret research were on the job. Both David Colquhoun and Edzard Ernst reviewed the paper and found the study flawed and the results dubious.

The study looked at three treatment groups, including two types of acupuncture (varied in needle location) and usual care, meaning no intervention. There was no sham or placebo acupuncture, which in my opinion is a fatal flaw. Again – at this point in acupuncture research, a study without at least sham acupuncture is worthless. The study found no difference between the two acupuncture groups, so they pooled both of those groups and compared it to the no treatment group and found some advantages for acupuncture.

Here is the critical fatal flaw in the study, however. David spells it out nicely, so I quote:

Table 1 of the paper lists 24 different tests of statistical significance and focuses attention on three that happen to give a P value (just) less than 0.05, and so were declared to be “statistically significant”. If you do enough tests, some are bound to come out “statistically significant” by chance. They are false positives, and the conclusions are as meaningless as “green jelly beans cause acne” in the cartoon. This is called P-hacking and it’s a well known cause of problems. It was evidently beyond the wit of the referees to notice this naive mistake. It’s very doubtful whether there is anything happening but random variability.

Yes, this paper screams p-hacking. It should also be noted that the primary outcome did not achieve statistical significance, which means that technically the study was negative. The secondary measures that barely made it over the 0.05 line were cherry picked. These results are consistent with random noise, or what is often referred to as, “Interrogating the data until it confesses.”

I would also point out that the study showed no difference between the two types of acupuncture. Again – it does not matter where you stick the needles, because acupuncture points are pure pseudocience. They do not exist. Also, the researchers decided, because they were treating infants, not to stick the needles to the “proper” depth to elicit the de qi. Rather they used superficial needling. So this study also demonstrates that it does not matter if or how you insert needles.

That is a big clue that acupuncture is nothing but an elaborate placebo. The details of how and where you stick the needles never seem to matter. It’s like having a drug, in which it does not matter what dose you use, or what treatment interval or duration you use. None of the details of administration seem to matter, just as if the drug were a placebo.

The study reports few side effects, but they included crying during the procedure. So, the placebo treatment in this case causes the very symptom you are trying to treat.

I would also add to prior criticisms that one of the major weaknesses of acupuncture studies is poor blinding. Rigorous methods need to be used to ensure proper blinding and to assess blinding to make sure it worked. This study was only single-blind, and there are reasons to suspect that even this blinding was inadequate.


The pattern with acupuncture studies is frustratingly common – a preliminary study with significant flaws shows an almost certainly dubious outcome, which is then falsely trumpeted to the media as showing acupuncture works. This generates another round of positive media reporting for acupuncture, and the criticisms are rarely heard. When truly rigorous studies of acupuncture are done they are almost always negative, but the results are often twisted to make it seem like acupuncture work, mostly by focusing on the unblinded comparison to no treatment.

Meanwhile systematic reviews tell a consistent story – acupuncture does not work. The promotion of acupuncture is as disconnected from reality as any science denial. They have their narrative, every element of which is false. Acupuncture as practiced today is not ancient. There is no evidence that acupuncture points exist, and clinical studies do not show any real effects beyond placebo.

The narrative, however, is winning over reality.

Like this post? Share it!

7 responses so far