By Brian D. Earp
@briandavidearp
*Note that this article was originally published at the Huffington Post.
Introduction
In the New York Times yesterday, psychologist Lisa Feldman Barrett argues that “Psychology Is Not in Crisis.” She is responding to the results of a large-scale initiative called the Reproducibility Project, published in Science magazine, which appeared to show that the findings from over 60 percent of a sample of 100 psychology studies did not hold up when independent labs attempted to replicate them.
She argues that “the failure to replicate is not a cause for alarm; in fact, it is a normal part of how science works.” To illustrate this point, she gives us the following scenario:
Suppose you have two well-designed, carefully run studies, A and B, that investigate the same phenomenon. They perform what appear to be identical experiments, and yet they reach opposite conclusions. Study A produces the predicted phenomenon, whereas Study B does not. We have a failure to replicate.
Does this mean that the phenomenon in question is necessarily illusory? Absolutely not. If the studies were well designed and executed, it is more likely that the phenomenon from Study A is true only under certain conditions. The scientist’s job now is to figure out what those conditions are, in order to form new and better hypotheses to test.
She’s making a pretty big assumption here, which is that the studies we’re interested in are “well-designed” and “carefully run.” But a major reason for the so-called “crisis” in psychology — and I’ll come back to the question of just what kind of crisis we’re really talking about (see my title) — is the fact that a very large number of not-well-designed, and not-carefully-run studies have been making it through peer review for decades.
Small sample sizes, sketchy statistical procedures, incomplete reporting of experiments, and so on, have been pretty convincingly shown to be widespread in the field of psychology (and in other fields as well), leading to the publication of a resource-wastingly large percentage of “false positives” (read: statistical noise that happens to look like a real result) in the literature.
A further problem is that — prior to the Reproducibility Project — large scale, systematic replication attempts of previous findings from the literature just weren’t being done. In large part, this is because “copying” someone else’s study has traditionally been seen as terribly un-sexy, not to mention a poor use of one’s time and resources (from a career perspective); and in any event most journals until quite recently wouldn’t consider publishing such “un-original” work in the first place.
This led to the so-called “file drawer” problem whereby “failed replications” (or other negative results) would very rarely be written up and submitted, because there was little hope of ever getting them published.
Instead, they would be tucked away in the researcher’s “file drawer,” never to be seen or heard from again. The consequence of this fact is that there has been a systematic bias against negative results ever appearing in the published record, affecting most of the psychology literature (and thus badly skewing its results) since the beginnings of the modern existence of the discipline.
There’s work to be done
So it isn’t just a matter of psychology trucking along with a pretty good handle on the sorts of widespread biases that lead to “false positives” being regularly published, nor does psychology have a reliable mechanism in place for weeding them out even after the fact. As it stands now, that is not the case, although psychology as a discipline has been making some pretty impressive strides in the right direction in the last few years.
Thus, while Feldman Barrett argues that “failed replications” might just mean that phenomenon X is valid and real, but only holds up “under certain conditions,” they might also very well mean that the original results — due to the systematic biases and questionable research practices I mentioned above — are “illusory” after all.
Now, we have to be careful here. A single (apparent) failure to replicate X tells us almost nothing at all about whether X is real. For one thing, Feldman Barrett is certainly right that a change in context in the second experiment could turn out to be the deciding factor. For another thing, the second research team might also have made a mistake–they might have run the experiment (even in same context) improperly.
But a lot of direct replications (meaning: replications that attempt to be as close to the original as possible, including keeping the all of relevant context the same), carried out by many independent labs over time, that reliably fail to show the reported results, do give us good reason to doubt the very existence of the original effect.
What do we mean when we say “crisis”?
Here is where I’ll tease apart the meaning(s) of “crisis.” The first way you could gloss this term is, “crisis of confidence” as in the following quotation:
Is there currently a crisis of confidence in psychological science reflecting an unprecedented level of doubt among practitioners about the reliability of research findings in the field? It would certainly appear that there is.
That is what Hal Pashler and E. J. Wagenmakers had to say in a recent issue of Perspectives on Psychological Science.
And there is a crisis in that sense–a crisis of confidence in the reliability of published findings. In other words, due to a recent glut of attention on iffy (but widespread, even commonly accepted) research and reporting practices, psychologists are beginning to realize they had better get some serious mechanisms in place to combat the amount of low-quality papers being published in their field. And there are a lot of low-quality papers being published–in psychology and in a lot of other fields as well.
But there’s a second way you could think about “crisis,” and this is the way that I expect Feldman Barrett is actually objecting to in her piece in the New York Times. And that’s the idea that if you show (or apparently show) that a bunch of studies in field X don’t seem to replicate, it means that the field isn’t truly scientific. This would be based on the–very wrong–assumption that the published record in a “truly scientific field” is supposed to be (mostly) “correct.” That a paper in a scientific journal, in other words, is the “final word” on whether some phenomenon really exists.
On this way of thinking about things, since a lot of published findings in psychology don’t appear to replicate (whether because the findings only hold up under certain conditions, and the round-two scientists didn’t honor those conditions, or because the original findings are really just noise), psychology is in a state of “crisis” with respect to whether it’s really a science at all.
Here I agree with Feldman Barrett. That would be to misunderstand “what science is,” as she writes near the end of her piece. Science is not a record of incontrovertible facts. Science is a work in progress. Failure (or apparent failure) to replicate previous findings–whether in psychology or in any other discipline–is in fact a healthy and normal part of the longer-term scientific process.
Some (apparent) failures to replicate end up triggering new lines of research in pursuit of the boundary conditions for phenomenon X. Other (apparent) failures to replicate — if enough of them build up over time — end up showing us that phenomenon X isn’t really real after all: instead, it’s just a statistical fluke, or an artifact of bad design, or whatever. All of that is science, and all of that is to be expected in–especially–a young field like psychology.
So if that’s the crisis, then the answer is public education. The public needs to know that science is an ever-evolving enterprise, and that we should expect a lot of old findings to be revised (or thrown out, or built upon) as we move forward in the scientific process. On that count, I agree with Feldman Barrett completely.
But I think that she understates the problem facing psychology (and other “messy” disciplines, like medicine), with respect to the real crisis of confidence within those fields. Publication bias is a serious problem. The “file drawer” problem is a serious problem. Questionable research practices, sloppy (or just plain wrong) statistics, and ineffective peer review are all serious problems. Finally, the typical rarity of conducting “direct” replications–much less writing them up and submitting them for publication–is an ongoing, deep-seated problem for many areas in science. For its role in trying to address this problem systematically, the Reproducibility Project deserves a huge round of applause.
Further reading
Earp, B. D., & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology, Vol. 6, Article 621, 1-11.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science. Vol. 349, No. 6251, DOI: 10.1126/science.aac4716.