I’ve long been writing about problems in how science is communicated and published. One of the most well-known concerns in this context is publication bias — the tendency for results that confirm a hypothesis to get published more easily than those that don’t.
Publication bias has many contributing factors, but the peer review process is widely seen as a crucial driver. Peer reviewers, it is widely believed, tend to look more favorably on “positive” (i.e., statistically significant) results.
But is the reviewer preference for positive results really true? A recently published study suggests that the effect does exist, but that it’s not a huge effect.
Researchers Malte Elson, Markus Huff and Sonja Utz carried out a clever experiment to determine the impact of statistical significance on peer review evaluations. The authors were the organizers of a 2015 conference to which researchers submitted abstracts that were subject to peer review.
The keynote speaker at this conference, by the way, was none other than “Neuroskeptic (a pseudonymous science blogger).”
Elson et al. created a dummy abstract and had the conference peer reviewers review this artificial “submission” alongside the real ones. Each reviewer was randomly assigned to receive a version of the abstract with either a significant result or a nonsignificant result; the details of the fictional study were otherwise identical. The final sample size was n=127 reviewers.
The authors do discuss the ethics of this slightly unusual experiment!
It turned out that the statistically significant version of the abstract was given a higher “overall recommendation” score than the nonsignificant one. The difference, roughly 1 point on a scale out of 10, was statistically significant, although marginally (p=0.039).
The authors conclude that:
We observed some evidence for a small bias in favor of significant results. At least for this particular conference, though, it is unlikely that the effect was large enough to notably affect acceptance rates.
The experiment also tested whether reviewers had a preference for original studies vs. replication studies (so there were four versions of the dummy abstract in total.) This revealed no difference.
So this study suggests that reviewers, at least at this conference, do indeed prefer positive results. But as the authors acknowledge, it’s hard to know whether this would generalize to other contexts.
For example, the abstracts that were reviewed for this conference were limited to just 300 words. In other contexts, notably journal article reviews, reviewers are provided with far more information to base an opinion on. With just 300 words to go by, reviewers in this study might have paid attention to the results just because there wasn’t much else to judge on.
On the other hand, the authors note that the participants in the 2015 conference might have been unusually aware of the problem of publication bias, and thus more likely to give null results a fair hearing.
For the context of this study, it is relevant to note that the division (and its leadership at the time) can be characterized as rather progressive with regard to open-science ideals and practices.
This is certainly true; after all, they invited me, an anonymous guy with a blog, to speak to them, just on the strength of my writings about open science.
There have only been a handful of previous studies using similar designs to probe peer review biases, and they generally found larger effects. One 1982 paper found a large bias in favor of significant results at a psychology journal, as did a 2010 study at a medical journal.
The authors conclude that their dummy submission method could be useful in the study of peer review:
We hope that this study encourages psychologists, as individuals and on institutional levels (associations, journals, conferences), to conduct experimental research on peer review, and that the preregistered field experiment we have reported may serve as a blueprint of the type of research we argue is necessary to cumulatively build a rigorous knowledge base on the peer review process.