Jun 01 2015

Citation Bias – Confirmation Bias for Scientists

I’m a big fan of science for many reasons. Not only is the subject matter of science often incredibly interesting, but the process of science seems to work better than any other method humans have developed for knowing about the universe in which we live. Any fair-minded and knowledgeable view of human history cannot avoid this conclusion.

It’s therefore worthwhile thinking about and exploring the science of science itself, what we might call metascience. It is, in fact, a common narrative among skeptics and science communicators that, while science is awesome, it is practiced by biased and flawed humans. The history of science is one of error, bias, and ego that manages to slowly grind toward the truth.

Metascience is as important as metacognition, or thinking about thinking, and I write about both topics often. These are core knowledge-bases for any critical thinking skeptic. Here is a list I compiled of the most important issues with the quality of science. The goal here is not to criticize science, but to improve its practice, make it more efficient, minimize wasted resources, and help the public sift the reliable from the nonsense.

I’m now going to add another important concept to the list – citation bias.

To quickly summarize some of the main pitfalls in doing good science, there is researcher bias in which subtle (or not-so-subtle) errors in methodology can lead to false results. There is also publication bias in which journals have perverse incentives that distort the scientific record by preferring, for example, positive studies over negative ones. There are statistical biases where statistical techniques that favor false positives are preferred, and sometime techniques are chosen because they produce the desired results. There is a bias against replications, even though replicating experiments is perhaps the best way to sort out the real from the fake. And we can throw in the occasional outright fraud.

All of these problems plague even the best of science. Sometimes, however, the problems become so overwhelming that the result is pseudoscience, where the methods of science are distorted beyond the point where we can even call it legitimate science anymore.

Citation bias is yet another source of bias in the scientific literature, one that can be very subtle. A citation is when one published paper references another. The more citations a paper receives, the greater its perceived impact. The idea is that scientists are crowdsourcing the quality of each other’s research, and are voting with citations (similar to “likes” on Facebook or other social media).

Does this system work, however? Are there potential systematic biases that might distort the pattern of citations so that they reflect something other than the genuine scientific quality and usefulness of a published article?

In 2009 Steven Greenberg published: How citation distortions create unfounded authority: analysis of a citation network, in the British Medical Journal. He did a thorough analysis of all the citations involving a specific claim, “that β amyloid, a protein accumulated in the brain in Alzheimer’s disease, is produced by and injures skeletal muscle of patients with inclusion body myositis.” Don’t worry about the claim itself, that is incidental to the point of the paper. He identified several phenomena that potentially distort citations and can even create a false consensus of scientific opinion.

The first phenomenon he found was simple citation bias, a tendency to cite positive studies more than negative studies. Let’s say there are 20 studies addressing a specific scientific question, 15 showing negative results and 5 showing positive results. The evidence would tend to favor rejecting the claim. However, the 5 positive studies may be cited 200 times in the literature, while the 15 negative studies are collectively cited only 30 times. The number of citations distorts the underlying reality of the scientific evidence and creates a false impression that the research is positive.

This can happen innocently, just as a manifestation of confirmation bias. If you look for papers that support rather than refute a claim, you will likely find them and can cite them to support the claim. Often researchers need to base their own research on what has already been established. Therefore, in writing their grant request and in writing their subsequent papers, they may specifically search for papers that support the claims than underlie their own research question. This is just confirmation bias for scientists.

Greenberg also found instances of what he calls citation diversion – citing a paper but distorting its findings. This could mean citing a paper whose data is actually negative but presenting it as if it is positive. This is easier than it may seem, as papers often have complex data sets, with lots of variables and controls. A citation could pull out some aspect of the paper that makes the claim seem positive, when the entire results are clearly negative.

Another citation-related phenomenon is amplification – the publication of papers that repeat the claim but contain no new data. These are usually in the form of review articles. Review articles are highly useful, as they summarize a lot of research and can be a huge time-saver. They therefore get cited quite a bit.

Finally Greenberg identifies a phenomenon known as invention in which claims are manufactured entirely through citation, without any underlying research. For example, one paper may speculate about a mechanism underlying an apparent claim. Another paper then cites the first as a source for that speculation, but may promote it to a hypothesis. The next paper then cites the hypothesis but states it as a fact. This latest paper then becomes highly cited as the source of this new fact.

As an aside, I have witnessed this happen in medicine as well – a phenomenon I call “chart lore.” Well-meaning physicians rely upon previous medical records to establish that a patient, for example, has a specific diagnosis. However, if you track the references back you find that it leads to nothing. One physician perhaps told the patient they may have a diagnosis, but then it was later ruled out. The patient, however, did not fully understand, and they report to the next physician that they have the diagnosis. That report then becomes the basis for later physicians reporting that the patient has the diagnosis, even though the diagnosis was never made.

If we step back a bit – the big picture here is that citations, like any form of social communication, can follow a pattern of narrative creation and maintenance. One research group can have a strong belief in their pet theory. They produce some low quality data and allow researcher bias to create a false positive and support their hypothesis. They then amplify their footprint in the published literature by publishing reviews of the topic, their research, and even systematic reviews and meta-analyses of the question at hand. They may also engage in a common practice of squeezing multiple papers out of one experiment, again increasing citations. They engage in a little citation bias, citation inversion, and even citation invention all to support the narrative that their pet theory is supported by the evidence.

Taken together these practices can create a powerful, but entirely false, narrative that seems to be supported not only by the scientific evidence but by a consensus of scientific opinion.

The good news is that there is a fix to this problem – skepticism. (You probably saw that coming.) Scientists need to be the harshest skeptics of their own work. They should diligently try to prove their own pet theories wrong. Otherwise you may fall prey to the many subtle forms of bias that can create a powerful but false impression that what you want to believe is true.

Part of being skeptical is that you do not trust any secondary source, and that includes a citation (or, for my medical example, the medical records of another physician). You should always trace any claim back to its original source, and see for yourself if the data supports the claims being made. As I tell my medical students – always ask the question, how was this diagnosis actually established?

A colleague of mine, Matthew Schrag, did just that before conducting his own research on the presence of metals in brains of patients with Alzheimer’s disease. The pattern of citations supported the notion that metals were established, but he did the skeptical thing. He tracked everything back to the original research and looked for himself. He found that there were few studies supporting the hypothesis, all heavily cited, but many more studies refuting the claim that were neglected and rarely cited. He essentially replicated Greenberg’s findings of the power of citation bias.  (Thanks also to Matthew for sending me the link to Greenberg’s paper.)


The various forms of citation bias (including amplification, diversion, and invention) are specific manifestations of confirmation bias as it applies to the scientific literature. Even well-trained scientists can and do fall prey to these basic cognitive biases.  This is exactly why scientists need to be skeptical critical thinkers.

I don’t know how common citation bias is – is it endemic in the scientific literature or the occasional aberration? We need more study to know. I doubt it is rare, but objective estimates would be very useful.

In any case, this is a phenomenon of which scientists need to be acutely aware. I do think eventually serious and skeptical researchers do dig back to the original data and determine if they actually support the hypothesis at hand. This is why I never trust a claim when its support is highly dependent upon a single researcher or research group. I only have confidence when I see multiple groups independently come to the same conclusion, especially when one or more of those groups were initially skeptical of the claim.


3 responses so far