Sep 25 2012
Dead Fish Wins Ig Nobel
The Ig Nobel awards are a humorous take on the real thing, highlighting scientific studies over the last year that make you laugh, then make you think. This year’s winner in the neuroscience category is bringing back around a news story from earlier in the year : Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction.
Essentially the researchers used an functional MRI scanner (fMRI) to examine the brain activity of a dead salmon – and they found some. The point of this study was to generate an absurd false positive in order to demonstrate how fMRI studies might be plagued by false positives. It was a clever idea, and it garnered the attention to their point I suspect the researchers were after.
This strategy, of generating an absurd false positive to make a point, reminded me of the study showing that listening to music about old age made subjects actually younger. The point of the study was actually to demonstrate how exploiting researcher degrees of freedom can generate false positive data, even when the hypothesis is impossible.
Many people have asked me about the dead salmon fMRI study, wondering if the bottom line of this research is that fMRI studies are inherently unreliable and should be looked at with a high degree of suspicion. Well – yes and no.
The precise point of the study was this:
Can we conclude from this data that the salmon is engaging in the perspective-taking task? Certainly not. What we can determine is that random noise in the EPI timeseries may yield spurious results if multiple comparisons are not controlled for. Adaptive methods for controlling the FDR and FWER are excellent options and are widely available in all major fMRI analysis packages. We argue that relying on standard statistical thresholds (p < 0.001) and low minimum cluster sizes (k > 8) is an ineffective control for multiple comparisons. We further argue that the vast majority of fMRI studies should be utilizing multiple comparisons correction as standard practice in the computation of their statistics.
In other words – when doing an fMRI study researchers should make a statistical correction for multiple comparisons, something which can be done right in the fMRI analysis package, in order to avoid a false positive due to the failure to make such a correction. Let’s say a study compares the incidence of a symptom with 100 possible causes and uses a P value of 0.05 as the cut off for statistical significance. This essentially means that on average, assuming none of the possible causes are actually linked to the symptom, 5 of the possible causes will correlate with the symptom with statistical significance, just by chance alone. Researchers can correct for the fact that they made 100 comparisons to more properly reflect the probability that a correlation is real.
Failure to correct for multiple comparisons (and sometimes even disclose multiple comparisons) is common in published research, and is something to look out for. The problem is not unique to fMRI studies, but it is especially common in such studies.
fMRI is the technique of using MRI scanning to look at changes in blood flow in the brain and infer from that brain activity. This is potentially a very powerful tool – researchers can give subjects a task and then, in real time, see which parts of the brain light up. We can use this technique, therefore, to map the parts and connections of the brain and correlate them with specific functions.
The problem is that the brain is very complex and noisy. In a waking person there is likely to be all sorts of activity going on all the time. There is generally a low signal to noise ratio, and researchers have to pick out the signal they are looking for from this background noise. This is done through statistical analysis of the data. Inherent to this process, because there is so much data to sift through, are multiple comparisons, so much so that the process can pick out brain activity in a dead salmon just from statistical noise.
This does not mean that all fMRI research is worthless and should be ignored. What it means is that fMRI research is tricky, and while some of it is reliable, a lot of it is just noise that should be looked at with skepticism. No single fMRI study should be seen as definitive or reliable. Only the most rigorous studies are likely to be useful, and even then replication is necessary to see that the claimed signal is genuine.
For example, some acupuncture proponents have realized that fMRI studies are a way to make is seem as if acupuncture points are real and have a genuine physiological specificity. The position is contradicted by the rest of medical and biological research which essentially shows that acupuncture points do not exist. There are now many small fMRI studies looking at brain “activation” with acupuncture and finding that stuff happens in the brain when you stick people with needles. A recent study in Parkinson’s disease found activity in the basal ganglia, for example (the study is from the Department of Meridian and Acupoint in the College of Korean Medicine). These results are about as reliable as the brain activity in the dead salmon.
A systematic review and meta-analysis of fMRI studies in acupuncture found that the results were very heterogeneous – meaning they were all over the place, which is what we would expect if the results were due to false positives from sloppy design or statistical analysis. They also criticized the research for lack of transparency is methodology, something which is essential in general but particularly for a tricky technique like fMRI.
Conclusion
Studies using fMRI scanning may be highly useful and informative, but definitely need to be looked at with special care and skepticism. fMRI studies should be generally considered as if they were preliminary studies. The results may be interesting, but until they are replicated with rigorous design and a consistent result, the findings are dubious.
Unfortunately, fMRI studies give the false impression of high-tech precision, because of the pretty pictures of alleged brain activity that are generated and the sophisticated nature of the studies. They are often, however, little more than “statistical fishing expeditions,” to borrow a phrase from another criticism.
I would not throw the baby out with the bathwater, however. Careful researchers are making good use of fMRI and rigorous studies with legitimate statistical analysis (including correcting for multiple comparisons) are out there. When evaluating fMRI studies – just remember the dead salmon.
10 Responses to “Dead Fish Wins Ig Nobel”
Leave a Reply
You must be logged in to post a comment.






I think that Steve meant to say “0.05″ and not “0.5″ above.
We’ve seen this before. (See Comment 3: http://theness.com/neurologicablog/index.php/publishing-false-positives/comment-page-1/ )
I wonder if this is statistically significant?
Steve, in your example, you used a P value of 0.5. Didn’t you mean 0.05?
Also, I guess I’m a bit confused about the purpose of the Ig Nobel awards. Often they are given to researchers who do sloppy science or to school boards that would prefer to teach kids superstition than science, for example. But in this case, the salmon fMRI study appears to be an example of good science, a straightforward way to demonstrate how easily statistical analyses can be misapplied or abused. These types of studies are, in my opinion, at opposite ends of the science quality spectrum.
Thanks – yes, 0.05, corrected.
The Igs are not just to make fun of bad science, but to recognize interesting science that is humorous in some way. So you end up with both ends of the spectrum.
I’d like to see an fMRI study on people listening to various degrees of offensive language.
Hold on now! Isn’t this a small salmo…er sample size? One fish… come on! Also,how do we know that they weren’t picking up something real? Maybe the fish was having an out of carcass experience.Science doesn’t know everything!
How does one determine the p value? Since there is no study to show what the noise is for a particular fMRI study is there? Is it a universal value, or would it vary by study or technique? Is it just a value that is made up?
Funny you should blog about this Dr. Novella. I have been actively commenting over at the NCCAM blog. One of the recent ones by Dr. Killen had a thread in which someone attempted to use fMRI data of acupuncture points to demonstrate “objective” data that they actually exist. Below is my comment and you can read the full thread here.
fMRI is an interpretative model of what goes on. As such it is several layers away from the reality, and each layer is open to the introduction of errors which get amplified in the next one making the end-result hardly more then entertaining for the high order processes.
For sure, if you think of moving a limb you’ll see which part of the brain gets more active, but the rather far fetched studies now being churned out by the dozens going as far as to ‘conclude’ that “While religious and nonreligious thinking differentially engage broad regions of the frontal, parietal, and medial temporal lobes, the difference between belief and disbelief appears to be content-independent” http://www.ncbi.nlm.nih.gov/pubmed/19794914 for example based on it is beyond belief. (pun intended)
With such a crude and errorprone device this kind of ‘research’ seriously undermines the status of the science. Imho.
Ok, am I the only one who realizes what is going on here??? This was clearly a ZOMBIE salmon. We have been blindsided by naively thinking the zombie apocalypse will begin with humans! It has already begun in salmon! I just hope that the researchers weren’t bitten by the zombie salmon, or else the researchers will have succeeded in spreading zombieism to humans!
“Failure to correct for multiple comparisons (and sometimes even disclose multiple comparisons) is common in published research, and is something to look out for. The problem is not unique to fMRI studies, but it is especially common in such studies.”
I wonder what you mean by common, because I find it concerning that such an error could get pass peer review. I could see this being very common in a covert sense – that the researchers cherry pick among many variables, but then do not disclose this information. But- to do this in a way that could be detected and for that error to be missed is disturbing, especially for something that is routinely taught in introductory statistics classes