Mar 29 2012

Perception and Publication Bias

The psychological literature is full of studies that demonstrate that our biases affect our perception of the world. In fact psychologists have defined many specific biases that affect not only how we see but how we think about the world. Confirmation bias, for example, is the tendency to notice, accept, and remember data that confirms what we already believe, and to ignore, forget, or explain away data that is contradictory to our beliefs.

Balcetis and Dunning have published a series of five studies that add to this literature by showing what they call “wishful seeing.” In their studies they found that people perceive desirable items as being physically closer to them than less desirable items. This finding is plausible and easy to believe for a skeptic steeped in knowledge of cognitive flaws and biases. But is this finding itself reliable? Psychologists familiar with the history of this question might note that similar ideas were researched in the 1950s and ultimately rejected. But that aside, can we analyze the data from Balcetis and Dunning and make conclusions about how reliable it is?

Recently Gregory Francis did just that, revealing an interesting aspect of the “wishful seeing” data that calls it into question.  Ironically the fact that Balcetis and Dunning published the results of five studies may have weakened their data rather than strengthen it. The reason is publication bias.

Carrying out high quality scientific research is extremely difficult. There are many pitfalls, some of which are very subtle, that can distort the outcome of research. I have earlier written about exploiting researcher degrees of freedom - making decisions about the details of a study that can all seem reasonable in isolation but which can also systematically bias the outcome toward the positive. Even studies that look good on paper may be the result of this type of researcher bias. The literature taken as a whole is also a complicated beast. Most notably there is publication bias – the tendency for researchers and journal editors to favor publishing positive research while confining negative studies to the file drawer (the so-called “file-drawer effect”).

All this is why we don’t get very excited over single studies or studies of small size or limited methodological quality. The possibility of false positives due to bias is huge. It is more reasonable to wait for a consensus of large high quality studies and independent replication, especially if the claim seems inherently implausible.

Researchers and statisticians have become fairly sophisticated in analyzing not only individual studies but the pattern of studies in the published literature to root out bias. For example there is the well-established notion of funnel plots. You can make a graph of all published studies with the quality of the study on the vertical axis and the result (positive or negative) along the horizontal axis. If all studies on a scientific question are being published then you would expect to see a random distribution of outcomes around the real effect size, and the better quality the study the smaller the degree of variation. The result should look like a funnel zeroing in on the real effect size.

If there is publication bias, however, then the negative half of the funnel plot may be diminished or even missing. The scatter of studies will be skewed toward the positive. This is critical because systematic reviews and meta-analysis will reflect the distribution of published studies, and this publication bias may make a null effect seem real.

Francis has analyzed the five published studies of Balcetis and Dunning in a similar way to show that their data displays publication bias. Essentially he calculated the odds, given the size of each of the five studies, that they would all have come out positive, given the apparent effect size found in the data. He found that their data was below the cut off previously established.

In other words, a series of five small studies is likely to show a random scatter of results, and we can calculate the odds of getting any particular result. Even if the effect that Balcetis and Dunning claim to have found were real, their data should be more scattered. Instead we have five small studies that are all positive.

All of this does not mean that the wishful seeing effect is not real. It just means that there is evidence for internal publication bias in the Balcetis and Dunning data. Therefore we need a fresh set of data to investigate the question.

This also demonstrates the primary weakness of the meta-analysis – a method for combining the results of multiple studies into one large study. If we did a meta-analysis on the five studies published by Balcetis and Dunning we might find what appears to be one large data set with high power showing a positive effect. But five small studies are not the same thing as one large study, precisely because things like publication bias can affect the results.

Francis points out in his paper some of the things that might have biased the results in the wishful seeing series of studies. A researcher may, for example, gather data and look at the results, and if they are not positive then gather some more data and look at the results, and continue this process until they get a positive result. This might feel like just gathering more data to get a more reliable outcome, but if the stopping point is cherry picked when the data looks positive, that is a way to bias the outcome. It is a way of cherry picking data. This is one of the problems pointed out by Simmons et al in their paper on manufacturing false positives (which Francis does cite) by exploiting researcher degrees of freedom.

Conclusion

To me the question of researcher and publication bias is more interesting than the question of whether or not wishful seeing is a real phenomenon. Don’t get me wrong, perception bias is very interesting and an important realization for any critical thinker. The implications of researcher and publication bias, however, are far more profound and far reaching.

The take-home lesson is that when interpreting the research on any question all of the above concerns, and more,  need to be taken into account. Do not base conclusions on any one study. Low power or methodologically weak studies do not add up to reliable research. The potential for many kinds of bias are just too great.

What is reliable is when we have studies of sufficient power that show a robust effect with reasonable signal to noise ratio, and further that this result is independently replicated. Further these results need to stand up to not only pre-publication peer review, but post-publication review by skeptical experts who try their best to probe for flaws and weaknesses in the data. Only when data holds up to this kind of assault do we tentatively conclude that it may be true.

Further, any research question needs to be put into the context of other scientific knowledge. Do we have multiple independent lines of evidence all converging on one conclusion, or does the claim in question exist in isolation, or worse does it seem to be at odds with other lines of investigation?

These are all perfectly reasonable criteria that scientists apply every day. Over time we are learning more and more about how important these methods are, and how even serious and accomplished researchers can fall prey to subtle biases in their research.

When we hold up controversial claims to these accepted scientific criteria, it becomes readily apparent why they are controversial. Psi research, for example, has failed to produce high quality replicated studies. Homeopathy has only poor quality studies rife with bias to offer as evidence, while the larger better quality studies are all negative. Further, homeopaths cannot present a coherent explanation for the effects they allege. Contrary to multiple lines of evidence converging on homeopathy being real, everything we know from physics, chemistry, and biology are screaming that homeopathy is beyond implausible and should be considered, as much as can be determined by scientific methods,  impossible.

As for wishful seeing, I will have to reserve judgment. The effect is plausible and in line with other research showing similar perceptual biases, but this specific bias has not been established. An independent large study will be needed to address the question, and replication to confirm it. Meanwhile the Balcetis Dunning data set appears to be compromised and should be looked upon with skepticism. They may just be the victims of random chance (like a gambler who is just a bit too lucky for the casino’s comfort), we will have to wait for new data to see.

Share

34 responses so far

34 Responses to “Perception and Publication Bias”

  1. BillyJoe7on 29 Mar 2012 at 1:35 pm

    What gets me is the extraordinary waste of time and effort that goes into doing biased and methodologically flawed trials, meta-analyses, and systematic reviews.
    Isn’t it about time that the scientists involved be required to have a degree in how to perform methodologically sound and unbiased studies before being given grants to do them.
    Perhaps also each trial protocol needs to be scrutinised by a panel of experts for flaws and biases before being given the nod to proceed.
    It seems we are always closing the barn door after the horse has bolted.

  2. cwfongon 29 Mar 2012 at 3:02 pm

    Just have BillyJoe7 review the studies and you’ll get negative all the way down. And although he recommends the authors all should have a degree to establish some modicum of knowledge, he knows from experience that negativity does best without either a degree or knowledge.

  3. Steven Novellaon 29 Mar 2012 at 3:19 pm

    I would prefer that we stick to discussing the current post in the comments and not drag previous tiffs forward. Thanks.

  4. cwfongon 29 Mar 2012 at 3:30 pm

    OK, then I’ll simply point out that having each trial protocol scrutinized before the trial would do more harm than good. Closing the barn door and killing the horse to boot.
    Also your reasoning in this post is flawed, since the premise that there should be an equal balance between positive and negative reviews has no credible support that I can find.

  5. Steven Novellaon 29 Mar 2012 at 3:43 pm

    The balance is not between positive and negative (which I never said) but a distribution around the real effect size, whatever that is. The premise is not mine, but a very basic part of statistics. If there is no systematic bias in results, then they should reflect the effect being measured plus random variation. That random variation should produce a pattern that is statistically predictable. If it doesn’t, then you have a problem. That is the whole idea behind funnel plots. If you think it has no support, then explain why, and I suggest you do some reading about statistics and funnel plots.

  6. cwfongon 29 Mar 2012 at 3:59 pm

    I have done that reading and statistics cannot provide the degree of reasonableness behind what are subjectively selected as negative comments. Statistics can measure probability distributions but they can’t think.

  7. cwfongon 29 Mar 2012 at 4:04 pm

    http://www.measuringusability.com/blog/stats-usability-errors.php

  8. Steven Novellaon 29 Mar 2012 at 4:14 pm

    I think there is a misconception. I never said anything about positive and negative reviews. Only original data. So I’m not sure what you are talking about.

  9. cwfongon 29 Mar 2012 at 4:19 pm

    Positive and not positive. Not positive is therefor not negative?

  10. Steven Novellaon 29 Mar 2012 at 4:26 pm

    Seriously – I have no idea what you are talking about. Help me out with some complete sentences.

  11. cwfongon 29 Mar 2012 at 4:44 pm

    In turn I have no idea of what your point was if not about finding a balance between positive and less positive reviews. If less positive does not equate with more negative then what was the inference intended there?

  12. Steven Novellaon 29 Mar 2012 at 5:01 pm

    This has nothing to do with reviews. I was discussing a paper that examined the data from five studies, showing that the data reflects a distribution indicative of publication bias. I described funnel plots as another example of a similar type of analysis. These all have to do with looking at data, not reviews.

    I mentioned meta-analysis because they combine the data into one big study, but if the data is biased toward the positive the meta-analysis cannot account for that and will reflect the false-positive bias. Garbage in – garbage out, or in this case, false positive in – false positive out.

    This is ultimately about the various ways in which researchers bias their data toward the positive (either in their methods or publication bias), and how statistics can reveal publication bias.

    I have no idea what your “less positive equals more negative” comment is referring to. The only thing I can figure is the fact that the real effect size is not always zero. Data may be biased toward the positive, but there is still a real effect, so the data should correct not to zero but to some effect. The ultimate effect size, once better studies are done, is usually smaller than the effect size in preliminary studies – a phenomenon referred to as the decline effect, but just another manifestation of researcher and publication bias.

  13. cwfongon 29 Mar 2012 at 5:28 pm

    “Researchers and statisticians have become fairly sophisticated in analyzing not only individual studies but the pattern of studies in the published literature to root out bias.” I don’t know how they would do this without reviewing the studies, but you’re correct that they’re not professional reviewers.

    In any case in discussing bias, you stated that: “Most notably there is publication bias – the tendency for researchers and journal editors to favor publishing positive research while confining negative studies to the file drawer (the so-called “file-drawer effect”).”

    What was the point of that comment, if not about the lack of balance between positive and negative that statistical analysis would ferret out?

  14. jt512on 29 Mar 2012 at 5:52 pm

    cwfong wrote:

    “Researchers and statisticians have become fairly sophisticated in analyzing not only individual studies but the pattern of studies in the published literature to root out bias.” I don’t know how they would do this without reviewing the studies . . .

    Steven presented two examples of how publication bias can be detected on purely statistical grounds. The first example was the analysis by Francis of the Balcetis and Dunning data. Francis calculated the pooled effect size of the five studies reported by Balcetis and Dunning, and showed that the probability that all five studies would have rejected the null hypothesis was .076 is the pooled effect size is true. I’m not sure that I agree with their logic, but that’s another story.

    The second method of detecting publication bias that Steve discussed was by using funnel plots. Funnel plots show whether there is a relationship between study size and reported effect size in the literature. If there is no publication bias, then there should be no relationship: both large and small studies should estimate the same effect size on average. However, since smaller studies have limited statistical power, they are less likely to reject the null hypothesis. If there is publication bias, then small null studies will tend not to be published, but small positive studies will be. This shows up in a funnel plot as asymmetry: typically one of the lower corners of the “funnel” will be missing.

    Jay

  15. daedalus2uon 29 Mar 2012 at 5:52 pm

    A good example with real data is one that Feynman used, the value of the charge on the electron.

    Millikan made the first measurement, and Millikan was an excellent experimentalist, among the best of his cohort of physicists. However he used the wrong value for the viscosity of air, so his answer was wrong and wrong in a characteristic direction and that error had nothing to do with his experimental technique.

    https://en.wikipedia.org/wiki/Cargo_cult_science

    When new researchers repeated his measurements, they often kept looking for “mistakes” until they found “the answer” that was close to what Millikan found. The published results were closer to Millikan’s wrong answer than they were to the correct answer that we know today.

    That would show up on the kind of plot that Dr Novella is talking about.

  16. cwfongon 29 Mar 2012 at 6:04 pm

    Jay, that’s all well and good, but null studies are not necessarily negative and can confirm a lack of bias as well as otherwise. In which cases there’d be no point in hiding them in order to accentuate the positive.

  17. jt512on 29 Mar 2012 at 6:20 pm

    cwfong, Steven was using “negative” to mean “null.” I’m not entirely clear on what the rest of your post means. There is no point in arguing whether null findings should be published, just as non-null findings should be. Everyone who isn’t an experimental psychologist already knows that. But what the Bem psi debacle revealed is that publication bias is institutionalized in experimental psychology.

  18. cwfongon 29 Mar 2012 at 7:02 pm

    jt512,
    My post was about the overall process that supposedly could determine there was bias based on the imbalance of positive and less positive (i.e., negative) research assertions, determined by statistical analysis. Like most of these types of mathematical models, in the end, it’s the assumptions that the models are supposed to confirm or not confirm, and that even if confirmed will prove nothing when the assumptions that they give evidence for have other flaws.
    Some evolutionists for example go to great extremes to demonstrate how the stochastic method of selection works, yet ignore an alternate to these assumptions that adaptive mutations may be shown by the same statistical analysis to work better. Which without the ability of either model to explain and demonstrate where and how the intents and purposes or lack thereof behind these theories effectively differ, the bias toward what is presently believed to work remains.

    And remember the famous admonition of Russell’s that all you need is a false premise to prove anything you want by logic.

  19. nybgruson 29 Mar 2012 at 9:26 pm

    stop embarassing yourself cwfong. You really have no idea what on earth you are talking about.

  20. cwfongon 29 Mar 2012 at 9:57 pm

    Another believer of the something from nothing paradigm makes a declaration. Master of the unexplained. Right on cue.
    Dawkinsian wisdom from a dead universe.

  21. cwfongon 29 Mar 2012 at 10:05 pm

    Oops, not supposed to drag previous tiffs forward. Got anything substantive to say on the present subject, nybgrus other than the usual negative assessment?

  22. HHCon 29 Mar 2012 at 10:59 pm

    I think Dunning’s work on wishful thinking leading to wishful seeing is interesting. One study which asks the subject to see a B or 13 leads to a reward of a glass of orange juice. Extend this thinking a bit and you can interpret why people are motivated to receive pay for thinking in a certain way. The media does this by taking a stand on interpreting events along a specific political spectrum and is financially supported to do it.

  23. PhysiPhileon 30 Mar 2012 at 1:12 am

    Good post. I see this among scientist today when they get their hands on a new experimental method that allows them to see events at a better resolution or from a slightly different perspective. Eg: using fMRI to infer from smeared out neuronal energy densities that a specific cognitive property resides there then they apply way to much significance to the data. This is the positive feedback loop: grants are given to people using the latest technology (sometimes revolutionary, sometimes a money trap)…scientific bias…interesting results produced because of selection bias: eg “regions of interest” as a euphemism for introducing selection bias because you read a study that neuroscientist showed a region light up when a thought word or experience occurred so you place your arbitrary threshold bounds around that region in the fMRI software)

  24. nybgruson 30 Mar 2012 at 5:47 am

    @HHC: I think the entire existence of Faux News is handily explained by wishful seeing.

    @physiphile: Don’t forget fun, fad, and alternative. Studies are often funded just because they are different – though those that fund them and ask for the funding see it as exploring new territory just ripe with possibilities.

    I actually got into an argument with a friend’s friend (whom I’d never met) on Facebook. My friend is quite science based and posted up a disparaging article on CAM, with specific mention of acupuncture. This other fellow came on to defend acupuncture as viable, and started spouting off sciencey sounding spew. I thought I was dealing with a lay person parroting what he’d read, so I came in and gently commented about the totality of data and the a priori likelihoods. Turns out he was a post-grad researcher working on an “amazing” electroacupuncture study he was hoping to be 2nd author on. You can easily imagine how the rest of the conversation went. Talk about “wishful seeing.”

  25. nybgruson 30 Mar 2012 at 5:48 am

    @cwfong: Nope. I am giving you exactly as substantive a response as you deserve based on your comments.

    Educate yourself first, at least in the basic premise of intelligent discourse, and then we can continue.

    Though your use of the term “evolutionist” belies the fact that that will almost certainly never happen.

    Toodles!

  26. BillyJoe7on 31 Mar 2012 at 3:37 am

    Let me to put it another way:

    We spend a lot of time finding errors in research, and the same sorts of errors come up time and again. This effectively makes the research useless or, at least, less than useful. Shouldn’t we be coming up with ways to prevent this tragic waste of time and effort?

    This could be done by educating researchers and limiting research funds to those who pass muster in knowledge about how to conduct a methodologically sound and unbiased clinical trial. The file drawer effect can be prevented by registering trials and disallowing publication of unregistered trials.

    Now that the factors behind faulty trial design are well known, isn’t it time to take preventative action?

  27. daedalus2uon 31 Mar 2012 at 10:49 am

    Billy-Joe, if the data is correct, that is pretty much all that can be expected. What we need to ensure is that there is enough data to show how good the data is, and whether the conclusions follow from it and whether the conclusions are generalizable.

    For example in the thread on ECT, I just realized that their entire conclusion is wrong, but because they put their data there it can be understood that it is wrong.

  28. cwfongon 31 Mar 2012 at 1:26 pm

    Even BillyJoe7 seems to be more positive here than nybgrus. You know, the one who always discusses these things with an additionally anonymous “friend.” The friend always loses the argument, of course.
    Something then has come from nothing, which even BillyJoe7 no longer believes.
    Yet when asked to make some substantive response to the same question publicly, nybgrus can only state that he’s already done that with a friend and his authority to now declare his adversaries wrong has been established, and he doesn’t have to show no stinking badge.
    Also wasn’t he the expert on biological evolution who believed that the professionals such as Margulis and Shapiro didn’t know near as much about those stinking bacteria as he did. A priori data was on his side. He proved that when he argued with a friend. A posteriori probability be damned.

  29. cwfongon 31 Mar 2012 at 1:34 pm

    Hey nybgrus, discuss this with your friend:
    http://www.apperceptual.com/baldwin-editorial.html

  30. cwfongon 31 Mar 2012 at 8:32 pm

    Here are some people putting my felt objections much better than I was able to:

    http://www.cochrane-net.org/openlearning/html/mod15-3.htm
    Publication Bias Interpreting funnel plots
    *From these examples, we can see that a funnel plot is not a very reliable method of investigating publication bias, although it does give us some idea of whether our study results are scattered symmetrically around a central, more precise effect. Funnel plot asymmetry may be due to publication bias, but it may also result from clinical heterogeneity between studies (for example different control event rates) or methodological heterogeneity between studies (for example failure to conceal allocation).*

    http://en.wikipedia.org/wiki/Likelihood_function
    *Attempting to interpret the likelihood of a hypothesis given observed evidence as the probability of the hypothesis is a common error, with potentially disastrous real-world consequences in medicine, engineering or jurisprudence. See prosecutor’s fallacy for an example of this.*

    http://lesswrong.com/lw/1ib/parapsychology_the_control_group_for_science/
    *Parapsychologists are constantly protesting that they are playing by all the standard scientific rules, and yet their results are being ignored – that they are unfairly being held to higher standards than everyone else. I’m willing to believe that. It just means that the standard statistical methods of science are so weak and flawed as to permit a field of study to sustain itself in the complete absence of any subject matter.
    — Eliezer Yudkowsky, Frequentist Statistics are Frequently Subjective*

    Further, from my dictionary: “Bias is a predisposition either for or against something; one can have a bias against police officers or a bias for French food and wines”
    So in research and publication of results, bias should be determined relative to the researcher’s purposes. If you have a new proposition in your field and want to demonstrate that your ideas have value, are you biased? And if so, is such bias necessarily inappropriate? Hardly.

  31. etatroon 02 Apr 2012 at 5:27 pm

    As Steve is undoubtedly aware, the publication bias (toward publishing positive results) comes from journal policies coupled with funding and career advancement priorities on the part of funding agencies (i.e. NIH) and universities. In order to get a grant, a researcher has to publish. In order to get a position or a promotion, or keep one’s job, a researcher has to publish AND have grants. They refer to this metric as “productivity.” Journals only tend to publish positive results. The only cases where they will publish negative results are if the study/project refutes a previously established claim of a positive result. In this case, typically, the bar is set higher as far as validity, significance, methodological soundness because the results are going against a previously established set of facts. The study needs to also provide some reason the other study would get positive results. If you look in Steve’s posts — you’ll find the perfect example in the XMRV virus and Chronic Fatigue Syndrome.

    I’m not sure how to address this problem. Indeed — we do need to have some metric of productivity for researchers, and publishing results is one of them. However — if a researcher spends 100K and 12 months of work testing a hypothesis that turns out to be wrong — should he or she not get future funding or a promotion? The temptation is to keep beating the data to a pulp until some publication can be eked out of questionable statistical-wrangling while playing down the 12 months of what would be perceived as failure.

    I have actually witnessed older, established, scientists (with boatloads of funding) beating a dead horse of disproven hypotheses, flawed animal models, ambiguous or negative results, — publishing questionable data; because their reputations depend on being right all the time and “productive.” Younger scientists need to attach their names to the older in order to seem like they are part of a “productive” team, in order to promote their careers and to secure funding.

    In my opinion, the problem of publication bias has 3 parts: 1. The “publish or perish” culture in academic research, 2. The capitalistic means of funding science in the US, and 3. The hierarchical culture of academia (new investigators, new ideas don’t get fair play). These problems are interconnected and each affects the other.

  32. davidsmithon 06 Apr 2012 at 6:11 am

    Steven N said,

    “You can make a graph of all published studies with the quality of the study on the vertical axis and the result (positive or negative) along the horizontal axis.”

    Why would this form a funnel shape? I can’t see any reason why there should be a relationship between variance in effect size and study quality. Don’t you mean ‘study size’ on the vertical axis?

  33. Malletmanon 06 Apr 2012 at 5:28 pm

    Regarding cfwong’s comments:

    It’s true that funnel plots only go so far, but they’re fairly unproblematic for most part in practice, especially since you can generally take a plot that ‘fits’ as indicative of non-biasing. Since study population heterogeneity and design heterogeneity are primary competing reasons for off-kilter plots, it would also make sense in this case that those could be ruled out, since they were carried out by the same investigator.

    It’s hard to blame design heterogeneity and population heterogeneity if the same person at the same institution is taking subjects from the same source population.

    david: I suspect that was Dr. N’s intent. I wouldn’t know how you would get a numeric value for “study quality” after all.

  34. BillyJoe7on 07 Apr 2012 at 6:14 pm

    I don;t understand the bias against negative studies.
    Surely it is as important to know that homeopathy doesn’t work as it is to know that antibiotcs (used correctly) do.

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.