A new paper from Nick J. Broers of Maastricht University argues that the size of the effects measured in psychology experiments is essentially meaningless.
An 'effect size' is simply the magnitude of an effect. For instance, if I show that giving students an apple before an exam increases scores by 5% on average, I could say that the effect size of the apples was 5%.
Psychologists have become more interested in effect sizes in recent years. There have been successful efforts to encourage reporting of standardized effect sizes in psychology papers, and this is now mandatory at some leading journals.
The argument for reporting effect sizes is compelling. At the most basic level, an effect size lets us know if a phenomenon is large enough to be interesting. A tiny effect could be statistically significant if measured with a large enough sample size, but most people would say that a tiny effect is not very important.
However, according to Broers, there is a fundamental problem with interpreting effect sizes in psychology. The problem is that very few psychological theories predict any specific effect size, and if theories don't predict effect sizes, effect sizes can't be used to judge the validity of theories.
Broers calls psychological theories 'verbal theories' because they are purely qualitative, in contrast to the theories of, say, physics, which make specific quantiative predictions.
Broers uses the example of cognitive dissonance theory. One of the predictions of this theory is that people will appreciate something more if they had to work hard to get it. Two early experimental tests of cognitive dissonance found evidence for the predicted effect, with very similar standardized effect sizes, of about d=0.7.
The close match of these effect sizes might seem like strong and exact evidence for the theory, but Broers disagrees:
It is tempting to believe that the close correspondence of the results provides us with real quantitative information on the effect of dissonance reduction. But it is important to realize here that the verbal theory that inspired the experiments [had] nothing quantitatively to say about the workings of cognitive dissonance.
Broers points out that the two experiments weren't even designed to be quantitatively precise. For instance, one used a 5-point scale and the other a 20-point scale to measure the outcome (appreciation), with seemingly no reason for these choices.
The arbitrariness of the outcome scales reflects the indifference of the researchers toward whatever quantitative outcome the study might yield. The only purpose of quantification was to enable [significance testing] to underwrite the ordinal theoretical prediction. The conclusion must then be that observed effect sizes have no meaning outside the research design in which they were established.
So what does this mean for psychological research? Broers advocates a kind of effect-size neutral psychology, in which the important thing is to demonstrate the statistical significance of effects under increasingly diverse experimental conditions. In other words, 'conceptual replications' are more important than effect sizes:
The true accumulation of theoretical knowledge lies in a gradual increase of the practical relevance of the theoretical psychological principles. As the number of successful conceptual replications of a study multiplies, the breadth of applicability of these psychological principles will gradually increase, making the underlying theory both more potent and convincing.
Conceptual replications have got some bad press in recent years, especially among advocates of effect size reporting, so Broers' argument might ruffle some feathers (although there has also been praise for conceptual replications.)
In my view, this is a thought-provoking article, but I don't think Broers really grapples with the possibility that even a verbal theory makes an implicit claim about the effect size.
I think every verbal theory contains the unwritten claim "...and this effect is not trivial". Suppose my theory predicted an effect, and this effect were consistently observed but extremely tiny (d=0.01). Many people would conclude that my theory is not true in any meaningful way, and not worth further investigation.