I was at Koli Calling in 2016 and a paper was presented (“Replication in Computing Education Research: Researcher Attitudes and Experiences”) regarding the issue of replicating previous studies. Why replicate previous work? Because we have a larger number of known issues that have emerged in psychology and the medical sciences, where important work has not been able to be replicated. Perhaps the initial analysis was underpowered, perhaps the researchers had terrible bad luck in their sample, and perhaps there were… other things going on. Whatever the reason, we depend upon replication as a validation tool and being unable to replicate work puts up a red flag.
After the paper, I had follow-up discussions with Andrew Petersen, from U Toronto, and we talked about the many problems. If we do choose to replicate studies, which ones do we choose? How do we get the replication result disseminated, given that it’s fundamentally not novel work? When do we stop replicating? What the heck do we do if we invalidate an entire area of knowledge? Andrew suggested a “year of replication” as a starting point but it’s a really big job: how do we start a year of replication studies or commit to doing this as a community?
This issue was raised again at Learning@Scale 2017 by Justin Reich, from MIT, among others. One of the ideas that we discussed as part of that session was that we could start allocating space at the key conferences in the field for replication studies. The final talk as part of L@S was “Learning about Learning at Scale: Methodological Challenges and Recommendations”, which discussed general problems that span many studies and then made recommendations as to how we could make our studies better and reduce the risk of failing future replication. Justin followed up with comments (which he described as a rant but he’s being harsh) about leaving room to make it easier to replicate and being open to this kind of examination of our work: we’re now thinking about making our current studies easier to replicate and better from the outset, but how can we go back and verify all of the older work effectively?
I love the idea of setting aside a few slots in every conference for replication studies. The next challenge is picking the studies but, given each conference has an organising committee, a central theme, and reviewers, perhaps each conference could suggest a set and then the community identify which ones they’re going to have a look at. We want to minimise unnecessary duplication, after all, so some tracking is probably a good idea.
There are several problems to deal with: some political, some scheduling, some scientific, some are just related to how hard it is to read old data formats. None of them are necessarily insurmountable but we have to be professional, transparent and fair in how we manage them. If we’re doing replication studies to improve confidence in the underlying knowledge of the field, we don’t want to damage the community in doing it.
Let me put out a gentle call to action, perhaps for next year, perhaps for the year after. If you’re involved with a conference, why not consider allocating a few slots to replication studies for the key studies in your area, if they haven’t already been replicated? Even the opportunity to have a community discussion about which studies have been effectively replicated will help identify what we can accept as well as showing us what we could fix.
Does your conference have room for a single track, keynote-level session, to devote some time to replication? I’ll propose a Twitter hashtag of #replicationtrack to discuss this and, hey, if we get a single session in one conference out of this, it’s more than we had.