Feb 14 2017

Science – We Have a Reproducibility Problem

reproducibility-smallJohn Ioannidis has published an interesting commentary in JAMA about the current reproducibility crisis in basic and clinical scientific research. Ioannidis has built his career on examining the medical literature for overall patterns of quality. He is perhaps most famous for his study showing why most published research findings are wrong.

The goal here is to improve the science of science itself (or “metascience,” like “metacognition”). As science has progressed a few things have happened. The questions are getting deeper, more complex, and more subtle. Research methods have to be more rigorous in order to deal with these more subtle questions.

The institutions of science have also grown. Science is big business, which means that there are “market forces” which push institutions, scientists, and publishers into pathways of least resistance and maximal return. These pathways may not be optimal for quality research, however.

The stakes are also getting higher. We now have professions and regulatory schemes that are supposed to be science-based. If we take medical products, for example, the public is best served is products are safe and effective and truthful in the claims made for them. We need scientific research to tell us this, and we need to know where to set the bar. How much scientific evidence is enough? We can only answer this critical question if we know how reliable different kinds of scientific evidence are.

Ioannidis summarizes many of the challenges being faced by modern scientific institutions, all of which have been discussed here many times before. First he lays out the issue in the context of reproducibility. Whether or not the findings of a study can be replicated by independent researchers is the ultimate test of the quality of that research. There is probably something wrong with findings that cannot be reproduced. He writes:

Empirical efforts of reproducibility checks performed by industry investigators on a number of top-cited publications from leading academic institutions have shown reproducibility rates of 11% to 25%. Critics pointed out that these empirical assessments did not fully adhere to advocated principles of open, reproducible research (eg, full data sharing), so the lack of reproducibility may have occurred because of the inability to exactly reproduce the experiment. However, in the newest reproducibility efforts, all raw data become available, full protocols become public before any experiments start, and article reports are preregistered. Moreover, extensive efforts try to ensure the quality of the materials used and the rigor of the experimental designs (eg, randomization). In addition, there is extensive communication with the original authors to capture even minute details of experimentation. Results published recently on 5 cancer biology topics are now available for scrutiny, and they show problems. In brief, in 3 of them, the reproducibility efforts could not find any signal of the effect shown in the original study, and differences go beyond chance; eg, a hazard ratio for tumor-free survival of 9.37 (95% CI, 1.96-44.71) in the original study vs 0.71 (95% CI, 0.25-2.05) in the replication effort. However, in 2 of these 3 topics, the experiments could not be executed in full as planned because of unanticipated findings; eg, tumors growing too rapidly or regressing spontaneously in the controls. In the other 2 reproducibility efforts, the detected signal for some outcomes was in the same direction but apparently smaller in effect size than originally reported.

Basically formal attempts at assessing reproducibility of major scientific findings show disappointingly low rates. He goes on to caution that these results are also preliminary and limited. Further, we never know if the failure to replicate is a problem with the original study, the replication, or both.

Acknowledging the uncertainty, there is enough evidence to demonstrate that there is a problem worthy of attention. Essentially the scientific community is cranking out too many poor quality studies. They are also producing high quality research, make no mistake. There is still clear progress in science, but it is floating on a sea of mediocre research.

This is a problem because it wastes limited resources. It also becomes a challenge to keep up with all the research, especially while most of it is unreliable. It may also not be obvious from the paper itself when research is unreliable. The flaws may not be evident. Meanwhile, real world decisions may hinge on the outcomes.

What specific factors are causing the problem?

Sample sizes are generally smaller, statistical literacy is often limited, there is limited external oversight or regulation, and investigator conflicts to publish significant results (“publish or perish”) are probably as potent as investigator and sponsor conflicts in clinical research.

Doing rigorous research is hard, expensive, and time-consuming. I frequently hear researchers describe the study they would like to run, and then the study they are actually going to run because of limited resources. The choice often comes down to publishing one quality paper, or 3-4 mediocre ones. The pressure to publish pushes them toward the 3-4 mediocre studies. The incentive structure rewards that approach.

There is also pressure to publish positive and interesting research, including original research with surprising findings. Boring replications are less rewarded. Keeping a lab and a career going for 1-2 years before having enough data to publish a single paper can be difficult.

If we want to fix the replication problem we need to change the incentives. Researchers should be incentivized (even required) to keep all raw data, meticulously document procedures, make their protocol open-access, and even to get pre-approval for study designs before collecting data.

There is also a problem with statistical literacy. Researchers routinely engage in p-hacking without even realizing it, or perhaps they know they are “cutting corners” but not fully realize how much it invalidates their research. Part of the problem is the culture of research. There is an overreliance on a simplistic frequentist approach, where getting to a p-value of 0.05 by any means is the holy grail. This problem can be fixed by better educating researchers, better oversight, and higher standards at journals for publication.

Conclusion

Science still works, and advances are made by the most rigorous research which is independently replicated. It takes years, often even decades, for a research program to mature to this level, however. The conclusion, therefore, is not that science is wrong, but that science takes longer than you think. There is a long build up of low-quality preliminary research that is highly unreliable. Eventually, in some cases, we get to the kind of rigor and replication that is reliable.

Along the way, however, confusion reigns. The public is informed, often in breathless terms, about preliminary findings that are likely wrong. Regulations and standards of care may even be based upon unreliable preliminary findings. Industries have emerged on the wave of unreproducible evidence (such as the supplement industry and the alternative medicine industry).

For now we definitely need to raise the bar of how much scientific evidence is convincing.

Meanwhile we need to find ways to reduce the number of unreproducible studies and increase the percentage of reliable studies. We are simply wasting too many resources on worthless research.

I agree with all of Ioannidis’s recommendations. Scientists should be publishing fewer more rigorous studies. Standards for research need to be higher, and statistical literacy needs to be much higher. We should have the goal of eliminating p-hacking entirely from published research. Academic institutions need to change their reward structure, and journal editors need to change their priorities.

Ioannidis also points out that we need to do this without imposing crushing regulations on scientists themselves. That can be counterproductive. We need to be smart, not heavy-handed. This will be challenging, but scientists generally are pretty smart people.

 

32 responses so far

32 Responses to “Science – We Have a Reproducibility Problem”

  1. chikoppion 14 Feb 2017 at 9:25 am

    Many journals list citations as a signal for how much influence a published research paper has.

    Perhaps a similar trail could be maintained for reproducibility. This would create an incentive to 1) encourage and fund reproducive efforts, 2) register reproductive experiments in advance (make it a requirement for inclusion), 3) create equity for the journal of record, and 4) reward more rigorous experimental and statistical methodology in original research.

    Journals could simply include an index page in each issue with updates to scores of past published titles. Assigning and maintaining a visible reproducibility score to published research would likely have a significant impact.

    “Citations: 74, Reproductions: -7/+2”

  2. daedalus2uon 14 Feb 2017 at 10:02 am

    The primary problem of reproducibility in science is that all of the users of science; the stakeholders who want science to be accurate, reproducible and reliable, are unwilling, or unable, to pay for the science they use. The secondary problem is that there are gatekeepers who ration the flow of “science” and impose charges disproportionate to the value they add for that service.

    The correct way to model the trophic flows or trophic levels of science: science generation, science dissemination, and science utilization is how trophic flows are modeled in ecosystem modeling. Primary production generates biomass via photosynthesis, that biomass is then consumed by herbivores and the herbivores are consumed by carnivores. Each trophic level consumes biomass and uses that biomass to sustain itself and to generate descendants. If biomass is extracted such that the generator cannot sustain itself, then that biomass generator goes extinct, as do all consumers that depend on that biomass.

    A good example is bees. Bees provide pollination services, which plants “pay for” with nectar and pollen. Plants need to generate sufficient surplus pollen and nectar to sustain a sufficient bee population to maintain the pollination services the plants require. If plants tried to get pollination services without providing nectar and pollen, bees would go extinct and then so would the plants. Similarly, if bees collected all pollen with 100% efficiency and left none to pollinate plants, the plants would go extinct and then so would the bees.

    If there was a gatekeeper between the bees and the plants, that extracted a portion of the pollen and nectar the plants generated or the bees gathered, that gatekeeper could drive bees or plants to extinction by extracting so much the remaining trophic flow was insufficient to sustain the population of bees or plants.

    Health care consumers have a very strong desire to have “the best” health care, which we know requires “the best” science, which we know costs money; money to train the scientists; money for laboratories, libraries, reagents; money for workers; money to gather together and publish scientific results; money to test those results to see if they are reliable and correct. Saving pennies on research, or on any aspect of the process, while squandering billions on insufficiently tested/understood harmful treatments makes no “sense”, but that is what is going on.

    Science research is suffering from a classic “tragedy of the commons”. The collective body of knowledge and effort we call “science” isn’t owned by anyone, it is held in common by all humanity. Unfortunately, only research that can be monetized is perceived to be of value, and then only by those who can monetize it.

    The current practice of rewarding individual scientists is like keeping track of which individual bees carried which individual pollen grains to which particular plants which then grew into specific seeds when then grew into plants of the next generation. Rewarding individual bees would be spending too much effort keeping track of stuff that isn’t really important. Plants don’t need individual bees, plants need hives of bees. Plants need ecosystems with multiple pollinators to provide redundancy and resiliency.

    It is the same with science. We don’t need individual hot-shot scientists, we need a robust ecosystem of science. An ecosystem that nurtures new scientists along, develops their careers, funds their research and takes care of them after they are no longer productive.

  3. TheGorillaon 14 Feb 2017 at 11:11 am

    It’s almost like scientific truths are mediated by institutions, Which in turn are the product of social structure and attitudes.

  4. Steve Crosson 14 Feb 2017 at 12:08 pm

    It is the same with science. We don’t need individual hot-shot scientists, we need a robust ecosystem of science. An ecosystem that nurtures new scientists along, develops their careers, funds their research and takes care of them after they are no longer productive.

    Deja Vu all over again.

    It sounds like a successful democracy (or society in general) requires an educated public.

    We really, really need to educate the public from an early age. Everyone must understand the importance of science and the scientific method — and the necessity of a robust infrastructure to support it. Everyone should be able to understand the difference between good and bad science. And perhaps most relevant to the current discussion, people need to realize that accepting and funding less glamorous and/or ‘pure’ research is a necessary part of long term success.

  5. jt512on 14 Feb 2017 at 2:31 pm

    Note on terminology: Reproducibility and replicability are not the same thing, and given the growing focus on research quality, it is important to understand the difference. Findings are reproduced when an independent investigator takes the data set from the original study and produces findings that are identical to those originally published. A study is replicated when an independent investigator repeats the study from scratch (including collecting new, independent data) and produces findings in agreement with the original. Replicability is a higher standard than reproducibility.

  6. hardnoseon 14 Feb 2017 at 7:50 pm

    “… we never know if the failure to replicate is a problem with the original study, the replication, or both.”

    Unless it shows acupuncture or homeopathy are effective — then we know the problem is with the original study.

    Or if it shows vaccines or GMOs are safe — then we know the original study is correct.

  7. SteveAon 15 Feb 2017 at 4:25 am

    jt512: “Findings are reproduced when an independent investigator takes the data set from the original study and produces findings that are identical to those originally published”

    Just to be clear, is this taking the data and applying another (different/better) statistical analysis to it?

  8. Bill Openthalton 15 Feb 2017 at 7:15 am

    daedalus2u —

    IWe don’t need individual hot-shot scientists, we need a robust ecosystem of science. An ecosystem that nurtures new scientists along, develops their careers, funds their research and takes care of them after they are no longer productive.

    One thousand well-funded, cared-for morons in a treadmill conceived by bureaucrats do not one genius make.

  9. mlegoweron 15 Feb 2017 at 8:42 am

    “Journals could simply include an index page in each issue with updates to scores of past published titles. Assigning and maintaining a visible reproducibility score to published research would likely have a significant impact.

    Citations: 74, Reproductions: -7/+2”

    The problem with this is that it has a knock on effect of discouraging replication studies. If I see a result and I think “That’s probably just noise”, two things discourage me from attempting a replication (relative to the status quo). First, I know that the primary researcher anticipated being graded on reproducibility, making it marginally less likely that the result will not replicate. And second, I know that my own replication will be graded on the basis of whether it also replicates, so if I’m right then my own study will not replicate either. So this incentivizes more careful primary research, but probably disincentivizes replication efforts.

    It would need to be combined with an independent incentive for producing replications.

  10. mlegoweron 15 Feb 2017 at 8:47 am

    In light of jt’s comment above, I was fairly loose with terminology in my comment, primarily because I don’t think I ever even considered that distinction. However, I don’t think anything is loss by replacing all instances of “replication”/”reproduction” with one or the other.

  11. daedalus2uon 15 Feb 2017 at 9:28 am

    Bill Openthalt, the problem is you don’t know and can’t know who is a hot-shot genius before they do the work, and because everyone evaluating them is not at there level, the peer reviewers can’t know either; except in hindsight, and then it is too late because if they didn’t get funding they never did the hot-shot genius work.

    There is a quote that I like;

    “Talent hits a target no one else can hit. Genius hits a target no one else can see.”

    Arthur Schopenhauer

    You have to play the odds, and fund a whole lot of investigators to get a few genius-like hits. You can’t pick them out ahead of time. Yes, some of that research money will be “wasted” on things that maybe didn’t work. That isn’t a “waste”, it is the cost of finding the things that do work.

    Compelling everyone to chase after known things; the most flashy hyped story in the glamour journals, you will find those who are most talented at finding those flashy things. Is that what our society and our science enterprise actually need?

  12. Bill Openthalton 15 Feb 2017 at 10:51 am

    daedalus2u —

    Of course, genius is more often than not an epitaph.

    The problem is that people (even geniuses) are motivated by human passions, the most powerful ones being wealth and status. Humans are monkeys before they are bees. Scientists are motivated by the sheer joy of discovery, and there are those for whom this is sufficient reward (this might occasionally be an indicator of true genius). But the pursuit of excellence in science is no different from the pursuit of excellence in art, or sports — without the tangible approval of society (money, status and honours) the sacrifices required for excellence aren’t worthwhile.

  13. daedalus2uon 15 Feb 2017 at 11:25 am

    “pursuit of excellence in science is no different from the pursuit of excellence in art”

    yes, and many genius level artists were not highly compensated. They were compensated enough to survive. Many scientists trying to pursue excellence in science are not compensated enough to survive.

    Scientists are being ground down by the requirement to submit endless and fruitless proposals; proposals which cannot be fairly evaluated and funded due to insufficient resources.

    Hammering on people who are trying to do too much with insufficient resources isn’t a way to get them to accomplish more.

  14. daedalus2uon 15 Feb 2017 at 11:27 am

    It doesn’t take “sacrifice” to do good work. It takes resources. The resources are needed before the good work can be done. Holding them out as a reward after the fact is useless.

  15. Bill Openthalton 15 Feb 2017 at 12:13 pm

    daedalus2u —
    The fact they weren’t highly compensated does not mean they did not aspire to wealth, or social status.

    No, doing merely good work does not need sacrifices. However, becoming very good in your field requires a lot of dedication, and yes, sacrifices.

    I do agree the current funding model looks insane to the average scientist — but there are ever more “scientists” and there will never be enough money. Public funds must be attributed to the most deserving (the nepotism of yonder is no longer socially acceptable), and that requires preparing good (and attractive) proposals.

  16. jt512on 15 Feb 2017 at 2:26 pm

    SteveA wrote:

    jt512: “Findings are reproduced when an independent investigator takes the data set from the original study and produces findings that are identical to those originally published”

    Just to be clear, is this taking the data and applying another (different/better) statistical analysis to it?

    No. To clarify, The idea of reproducibility is to be able to repeat every step of the original analysis and produce the exact same result. The original investigators may have thrown out inconvenient outliers, sliced and diced the data to get statistical significance, and performed an inappropriate analyses, but if they properly documented every step of their work so that an independent researcher can duplicate their results, then the study was reproducible.

    Reproducibility is thus not sufficient for validity. But preregistering the analysis protocol, documenting every step of the analysis, and making the original data freely available allows the original findings to be verified and the research methods criticized and corrected, if need be.

  17. SteveAon 16 Feb 2017 at 4:19 am

    jt512

    I get it now. Thanks for the info.

  18. daedalus2uon 16 Feb 2017 at 11:36 am

    Bill Openthalt, the current funding model appears crazy to all scientists, except the ones who have “captured” the gatekeepers of science publishing and science funding and so can get all the funding they want/need.

    The current metrics of determining the “value” of scientific research or a particular science publication (citation index of journals published in) are essentially worthless, and are being gamed.

    Becoming very good in your field only requires “sacrifices” because science publication, science funding and advancement is set up that way. These things are controlled from the top-down by “gatekeepers” who demand “compensation” for advancement.

  19. jt512on 16 Feb 2017 at 1:29 pm

    daedalus2u wrote:

    Becoming very good in your field only requires “sacrifices” because science publication, science funding and advancement is set up that way. These things are controlled from the top-down by “gatekeepers” who demand “compensation” for advancement.

    Sounds like someone’s been drinking up the kool-aid of left-wing social “science.”

  20. daedalus2uon 16 Feb 2017 at 3:11 pm

    When a post docs need to work under another PI until they are in their 40’s, there is insufficient opportunity to do original research and get credit for it.

    In 2012 (the latest year this paper had data on), only 7.1% of NIH funding went to PIs under 41, 18.1% went to PIs under 46, 31.1% went to PIs under 51, and only 47.1% went to PIs under 56.

    If you have to “sacrifice” the bulk of your career simply to stay in science, then science is not set up to foster career development and good research; it is set up to support the already successful.

    http://www.pnas.org/content/112/2/313.short

  21. jt512on 16 Feb 2017 at 3:57 pm

    daedalus2u wrote:

    When a post docs need to work under another PI until they are in their 40’s, there is insufficient opportunity to do original research and get credit for it.

    Postdocs do not normally stay postdocs into their 40s. Not even close.

    In 2012 (the latest year this paper had data on), only 7.1% of NIH funding went to PIs under 41, 18.1% went to PIs under 46, 31.1% went to PIs under 51, and only 47.1% went to PIs under 56.

    Normalize those statistics and get back to me when you’ve got something valid.

    If you have to “sacrifice” the bulk of your career simply to stay in science, then science is not set up to foster career development and good research; it is set up to support the already successful.

    The system is set up to foster career development. Funding agencies and universities have start-up and early-career grants available only to new researchers. Every young American academic scientist I know has gotten at least one such grant.

  22. daedalus2uon 16 Feb 2017 at 5:22 pm

    What do you mean by “normalize those statistics”?

    When more than half of NIH funding goes to PIs older than 56, that indicates a problem to me.

  23. jt512on 16 Feb 2017 at 5:50 pm

    I mean your statistics are meaningless unless you relate them to the proportion of scientists in each age group.

    Furthermore, of course a seasoned scientist gets more funding than a new scientist. My girlfriend, for instance, is a mid-career Distinguished Professor at a major research university. Typically, she is concurrently a PI or co-PI on a half-dozen or so grants, and she manages a mature research lab comprising around a dozen grad students and postdocs. But she certainly didn’t start out that way. You don’t start out at the top. How could you?

    A new Assistant Professor starts out with a little seed funding from her university. She will then get, typically, a single early-career grant during her first two years. She must then prove her ability as an independent scientist by using that money productively. She will be unlikely to get another grant until she does. But assuming she produces good research with the first grant, she will start to get more funding. In time, she begins to establish a successful track record, and she will be able to successfully compete for more and larger grants, allowing her to supervise more students and hire more post-docs, allowing her to further expand her research program. In the meantime, she gradually gains experience managing multiple research projects and multiple underlings.

    But all this takes time. Thus it is completely natural and necessary that more funding goes to mature scientists than to young newcomers.

  24. Bill Openthalton 16 Feb 2017 at 7:49 pm

    daedalus2u —
    Tenure is not usually a scientist’s most productive stage.

  25. jt512on 16 Feb 2017 at 8:09 pm

    “Tenure is not usually a scientist’s most productive stage.”

    I find that hard to believe, as well.

  26. Bill Openthalton 17 Feb 2017 at 7:09 am

    jt512 —
    All generalisations, including this one, are dangerous. 🙂

    A system that provides tenure from an early stage might be good for the nerd-like types who on Sundays would rather be running tests in the lab than going on a date. It would also attract lots of people who like cushy jobs (cf. the public service). Allowing people to use their discretion when attributing funding works great with those who’re so “honest” they don’t even want to arrange a summer job for their kids, but unfortunately, most folks are more or less into nepotism.

    In my (limited) experience, scientists with tenure spend far more time in meetings than in the lab, but they still get their name on the papers. YMMV.

  27. daedalus2uon 17 Feb 2017 at 9:27 am

    If we are trying to create a stable and long term “science ecosystem”, we need investigators at each and every stage of a scientific career; we need scientists in utero, as infants, as toddlers, as preschoolers, as middle schoolers, as high schoolers, as undergrads, as grads, as post docs, as PIs, as PIs with tenure.

    As people age, if they cannot make a living as a scientist, they are compelled to make a living doing other things. Keeping current in science is difficult. Drop out for a few years, and your career takes a big hit.

    It is not possible to “turn on” and “turn off” a science career. A scientist can do other things, but it takes years, decades of training and education to become a scientist. A non-scientist can’t become a scientist in a few months. There is no way to make scientists “just in time”. If you want scientists in 5 years, 10 years, 30 years, you need to start growing, educating, training them now.

    To a first approximation, the number of scientists or scientist trainees in each age cohort will decline with age. The number of 36 year old scientists plus trainees can only be less than the number of 35 year old scientists plus trainees last year which can only be less than the number of 34 year old scientists and trainees from 2 years ago.

    I see the “reproducibility crisis” as the expected result of trying to “sprinkle” rigor and reliability onto the scientific process at the end. Reliability has to be built-in from the beginning. The reason people are trying to “sprinkle” reliability on at the end is because they don’t want to pay what it actually costs to generate reliability up front.

    That is being penny wise and pound foolish. It may save pennies in reduced “cost”, but because the science being generated is less reliable, the science isn’t as useful and bad decisions are made because the science isn’t as good as it could be. The cost of those bad decisions greatly exceeds the cost to do the science right upfront, but it is a different person who pays for the bad decisions, than who pays for the good science.

    The idea that early career scientists need to “suffer” and “sacrifice” in order to become a good or great scientist is nonsense. That false meme is put out by those who profit by making early career scientists “suffer” and “sacrifice”.

    Making an early career scientist choose between having a career and having a family is abusive.

  28. jt512on 17 Feb 2017 at 2:33 pm

    Bill Openthalt wrote:

    In my (limited) experience, scientists with tenure spend far more time in meetings than in the lab, but they still get their name on the papers. YMMV.

    In my not-so-limited experience, you could not be more wrong.

  29. jt512on 17 Feb 2017 at 2:49 pm

    Let me rephrase that. Yes, as science professors’ advance in their careers, they take on more service responsibilities, which are time consuming. But, despite these additional responsibilities, they still run research their labs. The research that gets done to get those papers written with their names on them doesn’t get done without the guidance of the professor. Whom do you think the inexperienced grad students and postdocs who get to be first authors on those papers get their knowledge from? Who do you think has the scientific insight to envision those research projects in the first place? Who do you think has the knowledge, experience, and reputation to obtain the funding?

    My girlfriend, who, as I mentioned, is a tenured science professor, works 80 hours a week. Those service responsibilities don’t replace scientific responsibilities; they add to them.

  30. jt512on 17 Feb 2017 at 2:52 pm

    daedalus2u:

    For someone who knows nothing about what they’re talking about, you sure spend a lot of words doing it.

  31. Bill Openthalton 17 Feb 2017 at 7:38 pm

    jt512 —
    I still think that giving scientists tenure from the word go (actually, making them employees) is not going to lead to better science. Removing the need for grant writing (provided unlimited funds could be anything else but apple pie in the sky hopes) would neither.
    I never meant to suggest that all tenured professors were lazy so-and-sos leaching off their hapless grad students.

  32. Bill Openthalton 17 Feb 2017 at 7:40 pm

    and that should be leeching, obviously. 😉

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.