The Year(s) of Replication #las17ed L@S 2017Posted: April 22, 2017 Filed under: Education, Opinion | Tags: community, education, l@s, las17ed, learning, learning@scale, mit, replication, science, teaching, testing 6 Comments
I was at Koli Calling in 2016 and a paper was presented (“Replication in Computing Education Research: Researcher Attitudes and Experiences”) regarding the issue of replicating previous studies. Why replicate previous work? Because we have a larger number of known issues that have emerged in psychology and the medical sciences, where important work has not been able to be replicated. Perhaps the initial analysis was underpowered, perhaps the researchers had terrible bad luck in their sample, and perhaps there were… other things going on. Whatever the reason, we depend upon replication as a validation tool and being unable to replicate work puts up a red flag.
After the paper, I had follow-up discussions with Andrew Petersen, from U Toronto, and we talked about the many problems. If we do choose to replicate studies, which ones do we choose? How do we get the replication result disseminated, given that it’s fundamentally not novel work? When do we stop replicating? What the heck do we do if we invalidate an entire area of knowledge? Andrew suggested a “year of replication” as a starting point but it’s a really big job: how do we start a year of replication studies or commit to doing this as a community?
This issue was raised again at Learning@Scale 2017 by Justin Reich, from MIT, among others. One of the ideas that we discussed as part of that session was that we could start allocating space at the key conferences in the field for replication studies. The final talk as part of L@S was “Learning about Learning at Scale: Methodological Challenges and Recommendations”, which discussed general problems that span many studies and then made recommendations as to how we could make our studies better and reduce the risk of failing future replication. Justin followed up with comments (which he described as a rant but he’s being harsh) about leaving room to make it easier to replicate and being open to this kind of examination of our work: we’re now thinking about making our current studies easier to replicate and better from the outset, but how can we go back and verify all of the older work effectively?
I love the idea of setting aside a few slots in every conference for replication studies. The next challenge is picking the studies but, given each conference has an organising committee, a central theme, and reviewers, perhaps each conference could suggest a set and then the community identify which ones they’re going to have a look at. We want to minimise unnecessary duplication, after all, so some tracking is probably a good idea.
There are several problems to deal with: some political, some scheduling, some scientific, some are just related to how hard it is to read old data formats. None of them are necessarily insurmountable but we have to be professional, transparent and fair in how we manage them. If we’re doing replication studies to improve confidence in the underlying knowledge of the field, we don’t want to damage the community in doing it.
Let me put out a gentle call to action, perhaps for next year, perhaps for the year after. If you’re involved with a conference, why not consider allocating a few slots to replication studies for the key studies in your area, if they haven’t already been replicated? Even the opportunity to have a community discussion about which studies have been effectively replicated will help identify what we can accept as well as showing us what we could fix.
Does your conference have room for a single track, keynote-level session, to devote some time to replication? I’ll propose a Twitter hashtag of #replicationtrack to discuss this and, hey, if we get a single session in one conference out of this, it’s more than we had.
We talked about doing this at ICER. I’m opposed. If a replication study is good, it should be able to get into the conference on its own merits. If we have a replication study track, do we allow in weaker papers just because they’re replication studies? What if we get no replication studies in that year?
Maybe that could work at L@S or other larger conferences when they can be assured of some number of replication studies every year, some number of which may be good. In smaller communities like ICER, I’m opposed to set asides.
Thanks for the comment!
I’ll address the issue of the weaker replication study first. I think we have to set a suitable level of quality for replication. There is literally no point in conducting a weak replication study and I believe we have to be upfront about this. I’m not suggesting a back door of “replicate something to get into conference X”, although I can see how that can be a risk.
If anything, the replication studies should have to meet a higher standard because they have to be good enough to challenge previously accepted work (which has been cited sufficiently often to be one that we want to replicate). The methodological and analytical rigour has to be pretty close to flawless.
Replication studies often suffer in review because of a lack of novelty but if there’s enough direction in review (or an option on whichever paper management is used to tick ‘replication, low novelty expected’) to consider the merits of the paper in the absence of that, then that’s addressed. For example: “This conference welcomes research papers (10 pages), replication studies (6-8 pages) and short papers (4 pages), with the following assessment criteria”. Replication papers could be shorter because we can refer to the original paper for quite a lot.
I do take the point about smaller conferences having limited publication slots but I still think that a community identification exercise of which papers we really think have to be replicated would be valuable. Maybe they could form an identified part of the poster/demo session in smaller conferences as part of an ongoing community discussion of replication?
So much to do, so few conference slots, so many over-worked reviewers and PCs. I realise that I am asking for more of limited resources and we have to talk about the difficulty involved, as well as the issues of the desirability of such options.
It’s up to individual conferences and organising committees but I just wanted to get some discussion out there, given the interest at Koli and here at L@S.
I don’t think we necessarily need to set aside slots, but maybe explicitly encourage people to submit replication papers in the call, because if people don’t know that they are welcome (and also what the criteria are – originality obviously won’t be very relevant).
What’s a bit different is pre-registration, because there the entire timeline needs to be modified, so there the conference needs to make a decision to explicitly accept pre-registered studies before the results are known.
Good points, thank you! The planning required to allow for pre-registration (which should be relatively easy for replication studies) is something conferences will have to plan for.
VLDB (a top database conference) has a track for work that can include replications if something is learned from them. No fixed number of slots are allocated, simply there is acknowledgement that this is a legitimate style of research to be judged in its own way. The description (from http://www.vldb.org/2017/cfp_research_track.php ):
“Experiments and Analyses Papers
These papers focus on the evaluation of existing algorithms, data structures, and systems that are of wide interest. The scientific contribution of an E&A track paper lies in providing new insights into the strengths and weaknesses of existing methods rather than providing new methods. Some examples of types of papers suitable for the Experiments and Analyses category are:
experimental surveys that compare existing solutions to a problem and, through extensive experiments, provide a comprehensive perspective on their strengths and weaknesses, or
papers that verify or refute results published in the past and that, through a renewed performance evaluation, help to advance the state of the art, or
papers that focus on relevant problems or phenomena and through analysis and/or experimentation provide insights on the nature or characteristics of these phenomena.
We encourage authors of accepted E&A papers, at the time of the publication, to make available all the experimental data and, whenever possible, the related software. For papers that identify negative or contradictory results for published results by third parties, the Program Committee may ask the third party to comment on the submission and even request a short rebuttal/explanation to be published along with the submission in the event of acceptance.”