Weekly Thoughts: The Reproducibility Project
Here is an issue that caught our eye this week:
The Reproducibility Project
When we first heard of a 2011 Journal of Personality and Social Psychology article titled Feeling The Future: Experimental Evidence For Anomalous Retroactive Influences On Cognition And Affect, we hardly anticipated the effect it would have on our perception of academic research. However, this article, published by a respected academic and extensively peer-reviewed, claimed to have conducted nine different experiments showing that ESP (yes, we’re talking about the ability to see into the future) was real, a finding which caused significant turmoil in academia. While the research seemed to be flawlessly executed, it remained implausible that ESP was real. On the flipside, if ESP didn’t exist, how could such content be peer-reviewed by leading academics and published by a prestigious academic journal? What was going on?
Brian Nosek, psychologist at the University of Virginia, was so perturbed by the paper’s results he began to wonder if the problem was due to structural flaws in the way research itself is conducted. To test his theory, Nosek initiated the “Reproducibility Project” in collaboration with the nonprofit Center for Open Science. This seemingly straightforward project set out to repeat 100 studies published in the three top psychology journals, just to see if the results could be confirmed. By the end of the project, Nosek’s team was only able to confirm 39 studies, meaning they were not able to confirm the original findings in 61% of the published research. Furthermore, of the research they were able to confirm, the effects tended to be smaller or weaker than those reported in the original publication.
What happened? First, it is important to note it’s unlikely the discrepancies are the result of widespread fraud amongst academics. Rather, the answer seems to lie in the fact that academic publications — the measuring stick for academic success — have a strong bias for new, positive work. Nobody gets published — and by extension, nobody gets tenure — for conducting quality research which doesn’t yield some sort of connection, correlation, or finding. Even Nosek had to do his replication work in addition to his “active” research, because otherwise “the grad students in my lab would never get jobs.”
This publication bias creates two critical issues. First, the majority of negative research is never published, creating what is called the “file drawer” effect. Second, a lot of published work may actually be more representative of statistical flukes than repeatable truths. Planet Money explains:
“KESTENBAUM: Here, let me give you an example. This is from a little experiment I did this morning flipping a coin 10 times. I flipped it 10 times, Jacob. Nine times I got heads, only one tales. If you do this statistical analysis you do for, like, a drug trial or any scientific paper, this looks like a remarkable result. There’s only, like, a 1 percent chance that this is a fluke. So I send this off to the Journal Of Coin Flipping or whatever.
GOLDSTEIN: You’re going to get this published. What are you going to call it? “Heads Up…”
GOLDSTEIN: …”Coin-Flipping Bias In American Quarter Dollars Minted In 1977.”
KESTENBAUM: [Usually it would be] half heads, half tails – there a lot of experiments like this, so why don’t you read about these in the journals?
GOLDSTEIN: Because it’s such a boring (laughter) finding, right? Like, you get this finding. Nobody’s going to publish it. And if you’re the researcher, you think – I’m not even going to send it off. I’m just going to stick my results here in this file cabinet with all the other failed experiments.”
The bias towards positive results can also create mission drift. If a researcher doesn’t clearly define her objective and approach before starting work, it’s possible to drift away from her original intention in search of positive results. Nosek explains how this can manifest in subtle ways:
“When I do research in the laboratory, I have choices I make about how to analyze the data and about what of the data that I get to report. And so I might be more likely to find a way of analyzing the data that looks good for me – right? It confirms my hypothesis. It provides a result that’s exciting, that’s very publishable. I might decide that must be the right way to analyze the data, and I might do that while thinking and trying to be genuine and accurate. But – and the fact that I have a conflict of interest in this, where the results have implications for me and my career advancement, means that I might construct stories to myself that lead me to finding results and reporting results in literature that just are exaggerations of reality that just aren’t true.”
Furthermore, because part of producing credible academic work is the ability to build upon previous published work, one piece of biased research can create a house of cards. Nosek’s findings mean it can be difficult to know what research to trust, as one piece of seemingly ground-breaking research often supports entire fields — and careers — of work.
Take, for instance, the findings of research conducted almost 20 years ago about the nature of “ego depletion.” As Daniel Engber from Slate explained, this study, hailed as “revolutionary” and cited more than 3,000 times, was said to have “revealed a fundamental fact about the human mind: We all have a limited supply of willpower, and it decreases with overuse.” The original finding led to multiple follow on studies, which were confirmed in a 2010 meta-analysis of 83 studies and 198 separate experiments. However, after some researchers began to question the original work, the Association for Psychological Science sought to replicate the effect of ego-depletion in a study that included 2,000 subjects tested in 24 labs across the world. The results, published in Perspectives on Psychological Science found “a zero-effect for ego depletion: No sign that the human will works as it’s been described, or that these hundreds of studies amount to very much at all.”
Although some have refuted the findings of reproducibility work, there is now significant debate within the academic community regarding the credibility of research in general. And while psychology publications have received a lot of the scrutiny, reproducibility concerns are applicable to a wide range of fields because the problem is with the framework in which research is conducted, rather than the research itself.
There are, however, ways to reduce these structural biases. To avoid the “file drawer” effect, some advocate for an open online registry where all research is published regardless of outcome. This was implemented for drug research with tremendous results, as “before the registry was created, more than half of the published studies of heart disease showed positive results. After the registry was created, only 8 percent had positive results – from more than 50 to 8.” As a result, some leading research journals have implemented policies in which they refuse to publish unregistered work.
While we aren’t performing formal scientific studies on a day-to-day basis, we do spend much of our time on project based work intended to prove or disprove some working hypothesis about how a certain business may be operating or how it might operate in the future. By researching the research process itself, we can make sure to understand how our own biases may impact our results, or obscure the signal in the data. While we like to get useful results as much as anyone else, we believe our investigation this week will help prevent the “file drawer” effect from being a prominent factor at Chenmark.
Have a great week,
Your Chenmark Capital Team