Sep 25 2012

Dead Fish Wins Ig Nobel

The Ig Nobel awards are a humorous take on the real thing, highlighting scientific studies over the last year that make you laugh, then make you think. This year’s winner in the neuroscience category is bringing back around a news story from earlier in the year : Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction.

Essentially the researchers used an functional MRI scanner (fMRI) to examine the brain activity of a dead salmon – and they found some. The point of this study was to generate an absurd false positive in order to demonstrate how fMRI studies might be plagued by false positives. It was a clever idea, and it garnered the attention to their point I suspect the researchers were after.

This strategy, of generating an absurd false positive to make a point, reminded me of the study showing that listening to music about old age made subjects actually younger. The point of the study was actually to demonstrate how exploiting researcher degrees of freedom can generate false positive data, even when the hypothesis is impossible.

Many people have asked me about the dead salmon fMRI study, wondering if the bottom line of this research is that fMRI studies are inherently unreliable and should be looked at with a high degree of suspicion. Well – yes and no.

The precise point of the study was this:

Can we conclude from this data that the salmon is engaging in the perspective-taking task? Certainly not. What we can determine is that random noise in the EPI timeseries may yield spurious results if multiple comparisons are not controlled for. Adaptive methods for controlling the FDR and FWER are excellent options and are widely available in all major fMRI analysis packages. We argue that relying on standard statistical thresholds (p < 0.001) and low minimum cluster sizes (k > 8) is an ineffective control for multiple comparisons. We further argue that the vast majority of fMRI studies should be utilizing multiple comparisons correction as standard practice in the computation of their statistics.

In other words – when doing an fMRI study researchers should make a statistical correction for multiple comparisons, something which can be done right in the fMRI analysis package, in order to avoid a false positive due to the failure to make such a correction. Let’s say a study compares the incidence of a symptom with 100 possible causes and uses a P value of 0.05 as the cut off for statistical significance. This essentially means that on average, assuming none of the possible causes are actually linked to the symptom, 5 of the possible causes will correlate with the symptom with statistical significance, just by chance alone. Researchers can correct for the fact that they made 100 comparisons to more properly reflect the probability that a correlation is real.

Failure to correct for multiple comparisons (and sometimes even disclose multiple comparisons) is common in published research, and is something to look out for. The problem is not unique to fMRI studies, but it is especially common in such studies.

fMRI is the technique of using MRI scanning to look at changes in blood flow in the brain and infer from that brain activity. This is potentially a very powerful tool – researchers can give subjects a task and then, in real time, see which parts of the brain light up. We can use this technique, therefore, to map the parts and connections of the brain and correlate them with specific functions.

The problem is that the brain is very complex and noisy. In a waking person there is likely to be all sorts of activity going on all the time. There is generally a low signal to noise ratio, and researchers have to pick out the signal they are looking for from this background noise. This is done through statistical analysis of the data. Inherent to this process, because there is so much data to sift through, are multiple comparisons, so much so that the process can pick out brain activity in a dead salmon just from statistical noise.

This does not mean that all fMRI research is worthless and should be ignored. What it means is that fMRI research is tricky, and while some of it is reliable, a lot of it is just noise that should be looked at with skepticism. No single fMRI study should be seen as definitive or reliable. Only the most rigorous studies are likely to be useful, and even then replication is necessary to see that the claimed signal is genuine.

For example, some acupuncture proponents have realized that fMRI studies are a way to make is seem as if acupuncture points are real and have a genuine physiological specificity. The position is contradicted by the rest of medical and biological research which essentially shows that acupuncture points do not exist. There are now many small fMRI studies looking at brain “activation” with acupuncture and finding that stuff happens in the brain when you stick people with needles. A recent study in Parkinson’s disease found activity in the basal ganglia, for example (the study is from the Department of Meridian and Acupoint in the College of Korean Medicine). These results are about as reliable as the brain activity in the dead salmon.

A systematic review and meta-analysis of fMRI studies in acupuncture found that the results were very heterogeneous – meaning they were all over the place, which is what we would expect if the results were due to false positives from sloppy design or statistical analysis. They also criticized the research for lack of transparency is methodology, something which is essential in general but particularly for a tricky technique like fMRI.

Conclusion

Studies using fMRI scanning may be highly useful and informative, but definitely need to be looked at with special care and skepticism. fMRI studies should be generally considered as if they were preliminary studies. The results may be interesting, but until they are replicated with rigorous design and a consistent result, the findings are dubious.

Unfortunately, fMRI studies give the false impression of high-tech precision, because of the pretty pictures of alleged brain activity that are generated and the sophisticated nature of the studies. They are often, however, little more than “statistical fishing expeditions,” to borrow a phrase from another criticism.

I would not throw the baby out with the bathwater, however. Careful researchers are making good use of fMRI and rigorous studies with legitimate statistical analysis (including correcting for multiple comparisons) are out there. When evaluating fMRI studies – just remember the dead salmon.

Share

10 responses so far

10 Responses to “Dead Fish Wins Ig Nobel”

  1. wfron 25 Sep 2012 at 10:20 am

    I think that Steve meant to say “0.05″ and not “0.5″ above.

    We’ve seen this before. (See Comment 3: http://theness.com/neurologicablog/index.php/publishing-false-positives/comment-page-1/ )

    I wonder if this is statistically significant?

  2. Jim Shaveron 25 Sep 2012 at 10:21 am

    Steve, in your example, you used a P value of 0.5. Didn’t you mean 0.05?

    Also, I guess I’m a bit confused about the purpose of the Ig Nobel awards. Often they are given to researchers who do sloppy science or to school boards that would prefer to teach kids superstition than science, for example. But in this case, the salmon fMRI study appears to be an example of good science, a straightforward way to demonstrate how easily statistical analyses can be misapplied or abused. These types of studies are, in my opinion, at opposite ends of the science quality spectrum.

  3. Steven Novellaon 25 Sep 2012 at 10:54 am

    Thanks – yes, 0.05, corrected.

    The Igs are not just to make fun of bad science, but to recognize interesting science that is humorous in some way. So you end up with both ends of the spectrum.

  4. etatroon 25 Sep 2012 at 11:25 am

    I’d like to see an fMRI study on people listening to various degrees of offensive language.

  5. tmac57on 25 Sep 2012 at 1:53 pm

    Hold on now! Isn’t this a small salmo…er sample size? One fish… come on! Also,how do we know that they weren’t picking up something real? Maybe the fish was having an out of carcass experience.Science doesn’t know everything!

  6. SARAon 25 Sep 2012 at 5:20 pm

    How does one determine the p value? Since there is no study to show what the noise is for a particular fMRI study is there? Is it a universal value, or would it vary by study or technique? Is it just a value that is made up?

  7. nybgruson 25 Sep 2012 at 11:29 pm

    Funny you should blog about this Dr. Novella. I have been actively commenting over at the NCCAM blog. One of the recent ones by Dr. Killen had a thread in which someone attempted to use fMRI data of acupuncture points to demonstrate “objective” data that they actually exist. Below is my comment and you can read the full thread here.

    I have read through the articles you referenced. First off, the largest trial is 37 people, the other two are 15 and 12. In any sort of trial, that is significantly underpowered to generate any sort of reliable results. In brain scan studies specifically this is even worse. John Ioannidis demonstrated that the corpus of brain scan studies are vastly under powered and – even when generous accessions were made – that positive results from brain scan studies were double what should be statistically possible (Arch Gen Psychiatry. 2011;68(8):773-780. doi:10.1001/archgenpsychiatry.2011.28). Granted this may make for a solid pilot study to then do more rigorous analysis, but that has been completely lacking. In fact, looking over a PubMed search on the topic of fMRI scans and acupoints, it is notable to see that there are only 50 studies, they are all small, and they all either come out of China, the authors are all affiliated with a Chinese institution, and/or are published in journals with a low impact factor and an obvious likelihood of bias such as “J Acupunct Meridian Stud” (including the ones that you referenced). It is also well known that studies coming out of China on TCM and acupuncture are extremely unreliable with a massive publication bias. It has been documented that 99% of acupuncture trials out of China are positive (Control Clin Trials. 1998 Apr;19(2):159-66.) which is simply an impossible statistic outside of obvious publication bias and/or poor study methodology. In fact at least two large review of reviews has demonstrated that while “Utilization of Chinese-language databases greatly increased the number of potentially relevant references for each search. Unfortunately, due to methodological flaws, this additional information did not generate any usable information.” (Journal of Evidence-Based Medicine Volume 5, Issue 2, pages 89–97, May 2012) and “The quality of trials of traditional Chinese medicine must be improved urgently. Large and well designed randomised controlled trials on long term major outcomes should be funded” (Review of randomised controlled trials of traditional Chinese medicine BMJ 1999; 319 doi: 10.1136/bmj.319.7203.160) As for the studies you cite specifically: As with pretty much all other studies that come out of China there are significant methodological flaws. In all 3 proper blinding is not observed. One in particular (the PNAS paper) is also striking in that they found differences in the fMRI signal which they then attributed to the participant being either of “yin” or “yang” disposition. They try to claim that the confirmation of “yin” and “yang” characteristics was independently and blindly verified, but they do not at all elucidate what those characteristics are, how the are determined, and whether or not the designation was given before or after the fMRI results were obtained. Additionally, they admit that 1 blind determination was incorrect, which is rather significant given that it is essentially a 50/50 chance of getting it right by pure luck and the sample size was so small. Additionally they discuss the methodology used for selecting whether the pixel (which should be called a voxel in MRI studies) was active by comparing a correlation coefficient to a threshold value (TH) which they say was set at 0.4 for “most of the study.” There is no discussion as to why that particular value was chosen, nor a discussion as to why only “most of the study” used that value, nor which parts used that value, how it was determined that a different value would be used, nor what different value was actually used. The remainder of the studies follow this similarly poor methodology which, as I described above, is known to be par for the course in Chinese studies. Additionally, large studies with adequate power have shown that in fact there is no difference between sham and verum acupuncture in outcomes. A study of 1162 patients with chronic low back pain (which we would a priori expect to have the largest placebo responses and thus the largest effect sizes with placebo intervention) demonstrated ” At 6 Months, Response Rate Was 47.6% In The Verum Acupuncture Group, 44.2% In The Sham Acupuncture Group…Verum Vs Sham, 3.4% (95% Confidence Interval, −3.7% To 10.3%; P = .39)” (Arch Intern Med. 2007;167(17):1892-1898) In other words, it really didn’t matter where you put the needles it had the same effect which clearly demonstrates that there is no intrinsic utility to acupuncture but a distinct placebo response amongst the cohort of subjects. A study in the journal “Pain” did a large review of 57 systematic reviews. Of note only 4 were considered “high quality.” The review of reviews concluded “numerous systematic reviews have generated little truly convincing evidence that acupuncture is effective in reducing pain. Serious adverse effects continue to be reported.” (PAIN Volume 152, Issue 4 , Pages 755-764, April 2011). For all of these reasons I am quite unimpressed with the studies you have provided and would most certainly NOT consider them to be more objective. To stress the point, any brain scan data – and especially fMRI data – is subject to a large amount of observer bias in interpretation. A lack of blinding the researchers can very easily lead to biased results (whether intentional or not). As John Ioannidis has pointed out on this topic, the only way to prevent this would be to both blind the researchers AND require that the trial declare beforehand exactly what type of measurements will be used and what will be measured. The resolution of brain scans and our understanding of the complex interconnectedness of neurophysiology is simply too poor to consider them immune to bias and indeed can be very easily biased unintentionally. So this is what I mean when I refer to prior plausibility of a so-called CAM modality. A rigorous look at the corpus of data surrounding the topic is required and must be implemented, which is much more than simply reading the abstract which will *always* paint the conclusions in the most favorable possible light. I still see no plausibility for a putative mechanism of acupuncture beyond placebo effect and large analysis demonstrate clearly that there is no putative effect. Even the most recent meta-analysis demonstrates a statistically significant but clinically insignificant effect on pain from acupuncture. An intellectually honest researcher would recognize that in subjective assessments there is invariably a large amount of noise in the signal and that the threshold for accepting something as significant must accordingly be higher. Even in that case though, clinical significance is also of importance and no study in acupuncture without serious methodological flaws has demonstrated clinical significance for any use of acupuncture.

  8. petrossaon 26 Sep 2012 at 6:28 am

    fMRI is an interpretative model of what goes on. As such it is several layers away from the reality, and each layer is open to the introduction of errors which get amplified in the next one making the end-result hardly more then entertaining for the high order processes.

    For sure, if you think of moving a limb you’ll see which part of the brain gets more active, but the rather far fetched studies now being churned out by the dozens going as far as to ‘conclude’ that “While religious and nonreligious thinking differentially engage broad regions of the frontal, parietal, and medial temporal lobes, the difference between belief and disbelief appears to be content-independent” http://www.ncbi.nlm.nih.gov/pubmed/19794914 for example based on it is beyond belief. (pun intended)

    With such a crude and errorprone device this kind of ‘research’ seriously undermines the status of the science. Imho.

  9. Kawarthajonon 26 Sep 2012 at 9:57 am

    Ok, am I the only one who realizes what is going on here??? This was clearly a ZOMBIE salmon. We have been blindsided by naively thinking the zombie apocalypse will begin with humans! It has already begun in salmon! I just hope that the researchers weren’t bitten by the zombie salmon, or else the researchers will have succeeded in spreading zombieism to humans!

  10. ccbowerson 26 Sep 2012 at 10:23 am

    “Failure to correct for multiple comparisons (and sometimes even disclose multiple comparisons) is common in published research, and is something to look out for. The problem is not unique to fMRI studies, but it is especially common in such studies.”

    I wonder what you mean by common, because I find it concerning that such an error could get pass peer review. I could see this being very common in a covert sense – that the researchers cherry pick among many variables, but then do not disclose this information. But- to do this in a way that could be detected and for that error to be missed is disturbing, especially for something that is routinely taught in introductory statistics classes

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.