I was at Koli Calling in 2016 and a paper was presented (“Replication in Computing Education Research: Researcher Attitudes and Experiences”) regarding the issue of replicating previous studies. Why replicate previous work? Because we have a larger number of known issues that have emerged in psychology and the medical sciences, where important work has not been able to be replicated. Perhaps the initial analysis was underpowered, perhaps the researchers had terrible bad luck in their sample, and perhaps there were… other things going on. Whatever the reason, we depend upon replication as a validation tool and being unable to replicate work puts up a red flag.
After the paper, I had follow-up discussions with Andrew Petersen, from U Toronto, and we talked about the many problems. If we do choose to replicate studies, which ones do we choose? How do we get the replication result disseminated, given that it’s fundamentally not novel work? When do we stop replicating? What the heck do we do if we invalidate an entire area of knowledge? Andrew suggested a “year of replication” as a starting point but it’s a really big job: how do we start a year of replication studies or commit to doing this as a community?
This issue was raised again at Learning@Scale 2017 by Justin Reich, from MIT, among others. One of the ideas that we discussed as part of that session was that we could start allocating space at the key conferences in the field for replication studies. The final talk as part of L@S was “Learning about Learning at Scale: Methodological Challenges and Recommendations”, which discussed general problems that span many studies and then made recommendations as to how we could make our studies better and reduce the risk of failing future replication. Justin followed up with comments (which he described as a rant but he’s being harsh) about leaving room to make it easier to replicate and being open to this kind of examination of our work: we’re now thinking about making our current studies easier to replicate and better from the outset, but how can we go back and verify all of the older work effectively?
I love the idea of setting aside a few slots in every conference for replication studies. The next challenge is picking the studies but, given each conference has an organising committee, a central theme, and reviewers, perhaps each conference could suggest a set and then the community identify which ones they’re going to have a look at. We want to minimise unnecessary duplication, after all, so some tracking is probably a good idea.
There are several problems to deal with: some political, some scheduling, some scientific, some are just related to how hard it is to read old data formats. None of them are necessarily insurmountable but we have to be professional, transparent and fair in how we manage them. If we’re doing replication studies to improve confidence in the underlying knowledge of the field, we don’t want to damage the community in doing it.
Let me put out a gentle call to action, perhaps for next year, perhaps for the year after. If you’re involved with a conference, why not consider allocating a few slots to replication studies for the key studies in your area, if they haven’t already been replicated? Even the opportunity to have a community discussion about which studies have been effectively replicated will help identify what we can accept as well as showing us what we could fix.
Does your conference have room for a single track, keynote-level session, to devote some time to replication? I’ll propose a Twitter hashtag of #replicationtrack to discuss this and, hey, if we get a single session in one conference out of this, it’s more than we had.
[Edit: The conference is now being held in Hong Kong. I don’t know the reason behind the change but the original issue has been addressed. I have been accepted to Learning @ Scale so will not be able to attend anyway, as it turns out, as the two conferences overlap by two days and even I can’t be in the US and Hong Kong at the same time.]
There is a large amount of discussion in the CS Ed community right now over the LATICE 2017 conference, which is going to be held in a place where many members of the community will be effectively reduced to second-class citizenship and placed under laws that would allow them to be punished for the way that they live their lives. This affected group includes women and people who identify with QUILTBAG (“Queer/Questioning, Undecided, Intersex, Lesbian, Trans (Transgender/Transsexual), Bisexual, Asexual, Gay”). Conferences should be welcoming. This is not a welcoming place for a large percentage of the CS Ed community.
There are many things I could say here but what I would prefer you to do is to look at who is commenting on this and then understand those responses in the context of the author. For once, it matters who said what, because not everyone will be as affected by the decision to host this conference where it is.
From what I’ve seen, a lot of men think this is a great opportunity to do some outreach. A lot has been written, predominantly by men, about how every place has its problems and so on and so forth.
But let’s look at other voices. The female and QUILTBAG voices do not appear to share this support. Asking for their rights to be temporarily reduced or suspended for this ‘amazing opportunity’ is too much to ask. In response, I’ve seen classic diminishment of genuine issues that are far too familiar. Concerns over the reductions of rights are referred to as ‘comfort zone’ issues. This is pretty familiar to anyone who is actually tracking the maltreatment and reduction of non-male voices over time. You may as well say “Stop being so hysterical” and at least be honest and own your sexism.
Please go and read through all of the comments and see who is saying what. I know what my view of this looks like, as it is quite clear that the men who are not affected by this are very comfortable with such a bold quest and the people who would actually be affected are far less comfortable.
This is not a simple matter of how many people said X or Y, it’s about how much discomfort one group has to suffer that we take their concerns seriously. Once again, it appears that we are asking a group of “not-men”, in a reductive sense, to endure more and I cannot be part of that. I cannot condone it.
I will not be going. I will have to work out if I can cite this conference, given that I can see that it will lead to discrimination and a reduction of participation over gender and sexuality lines, unintentionally or not. I have genuine ethical concerns about using this research that I would usually reserve for historical research. But that is for me to worry about. I have to think about my ongoing commitment to this community.
But you shouldn’t care what I think. Go and read what the people who will be affected by this think. For once, please try to ignore what a bunch of vocal guys want to tell you about how non-male groups should be feeling.
Now there’s a title that I didn’t expect to write. In this case, I’m referring to how we break group tasks down into individual elements. I’ve already noted that groups like team members who are hard-working, able to contribute and dependable, but we also have the (conflicting) elements from the ideal group where the common goal is more important than individual requirements and this may require people to perform tasks that they are either not comfortable with or ideally suited for.
How do we assess this fairly? We can look at what a group produces and we can look at what a group does but, to see the individual contribution, there has to be some allocation of sub-tasks to individuals. There are several (let’s call them interesting) ways that people divide up up tasks that we set. Here are three.
- Decomposition into dependent sub-tasks.
- Decomposition into isolated sub-tasks (if possible).
- Decomposition into different roles that spread across different tasks.
Part of working with a group is knowing whether tasks can be broken down, how that can be done successfully, being able to identify dependencies and then putting the whole thing back together to produce a recognisable task at the end.
What we often do with assignment work is to give students identical assignments and they all solemnly go off and solve the same problem (and we punish them if they don’t do enough of this work by themselves). Obviously, then, a group assignment that can be decomposed to isolated sub tasks that have no dependencies and have no assembly requirement is functionally equivalent to an independent assessment, except with some semantic burden of illusory group work.
If we set assignments that have dependent sub-tasks, we aren’t distributing work pressure fairly as students early on in the process have more time to achieve their goals but potentially at the expense of later students. But if the tasks aren’t dependent then we have the problem that the group doesn’t have to perform as a group, they’re a set of people who happen to have a common deadline. Someone (or some people) may have an assembly role at the end but, for the most part, students could work separately.
The ideal way to keep the group talking and working together is to drive such behaviour through necessity, which would require role separation and involvement in a number of tasks across the lifespan of the activity. Nothing radical about that. It also happens to be the hardest form to assess as we don’t have clear task boundaries to work with. However, we also have provided many opportunities for students to demonstrate their ability and to work together, whether as mentor or mentee, to learn from each other in the process.
For me, the most beautiful construction of a group assessment task is found where groups must work together to solve the problem. Beautiful decomposition is, effectively, not a decomposition process but an identification strategy that can pinpoint key tasks while recognising that they cannot be totally decoupled without subverting the group work approach.
But this introduces grading problems. A fluid approach to task allocation can quickly blur neat allocation lines, especially if someone occupies a role that has less visible outputs than another. Does someone get equal recognition for driving ideas, facilitating, the (often dull) admin work or do you have to be on the production side to be seen as valuable?
I know some of you have just come down heavily on one side or the other reading that last line. That’s why we need to choose assessment carefully here.
If you want effective group work, you need an effective group. They have to trust each other, they have to work to individual strengths, and they must be working towards a common goal which is the goal of the task, not a grading goal.
I’m in deep opinion now but I’ve always wondered how many student groups fall apart because we jam together people who just want a pass with people who would kill a baby deer for a high distinction. How do these people have common ground, common values, or the ability to build a mutual trust relationship?
Why do people who just want to go out and practice have to raise themselves to the standards of a group of students who want to get academic honours? Why should academic honours students have to drop their standards to those of people who are happy to scrape by?
We can evaluate group work but we don’t have to get caught up on grading it. The ability to work in a group is a really useful skill. It’s heavily used in my industry and I support it being used as part of teaching but we are working against most of the things we know about the construction of useful groups by assigning grades for knowledge and skill elements that are strongly linked into the group work competency.
Look at how teams work. Encourage them to work together. Provide escape valves, real tasks, things so complex that it’s a rare person who could do it by themselves. Evaluate people, provide feedback, build those teams.
I keep coming back to the same point. So many students dislike group work, we must be doing something wrong because, later in life, many of them start to enjoy it. Random groups? They’re still there. Tight deadlines? Complex tasks? Insufficient instructions? They’re all still there. What matters to people is being treated fairly, being recognised and respected, and having the freedom to act in a way to make a contribution. Administrative oversight, hierarchical relationships and arbitrary assessment sap the will, undermine morale and impair creativity.
If your group task can be decomposed badly, it most likely will be. If it’s a small enough task that one keen person could do it, one keen person probably will because the others won’t have enough of a task to do and, unless they’re all highly motivated, it won’t be done. If a group of people who don’t know each other also don’t have a reason to talk to each other? They won’t. They might show up in the same place if you can trigger a bribe reaction with marks but they won’t actually work together that well.
The will to work together has to be fostered. It has to be genuine. That’s how good things get done by teams.
Valuable tasks make up for poor motivation. Working with a group helps to practise and develop your time management. Combine this with a feeling of achievement and there’s some powerful intrinsic motivation there.
And that’s the fuel that gets complex tasks done.
What are the characteristics of group work and how can we define these in terms that allow us to form a model of beauty about them? We know what most people want from their group members. They want them to be:
- Honest. They do what they say and they only claim what they do. They’re fair in their dealings with others.
- Dependable. They actually do all of what they say they’re going to do.
- Hard-working. They take a ‘reasonable’ time to get things done.
- Able to contribute a useful skill
- A communicator. They let the group know what’s going on.
- Positive, possibly even optimistic.
A number of these are already included in the Socratic principles of goodness and truth. Truth, in the sense of being honest and transparent, covers 1, 2 and possibly even 5. Goodness, that what we set out to do is what we do and this leads to beauty, covers 3 and 4, and I think we can stretch it to 6.
But what about the aesthetics of the group itself? What does a beautiful group look like? Let’s ignore the tasks we often use in group environments and talk about a generic group. A group should have at least some of these (from) :
- Common goals.
- Participation from every member.
- A focus on what people do rather than who they are.
- A focus on what happened rather than how people intended.
- The ability to discuss and handle difference.
- A respectful environment with some boundaries.
- The capability to work beyond authoritarianism.
- An accomodation of difference while understanding that this may be temporary.
- The awareness that what group members want is not always what they get.
- The realisation that hidden conflict can poison a group.
Note how many of these are actually related to the task itself. In fact, of all of the things I’ve listed, none of the group competencies have anything at all to do with a task and we can measure and assess these directly by observation and by peer report.
How many of these are refined by looking at some arbitrary discipline artefact? If anything, by forcing students to work together on a task ‘for their own good’, are we in direct violation of this new number 7, allowing a group to work beyond strict hierarchies?
I’ve worked in hierarchical groups in the Army. The Army’s structure exists for a very specific reason: soldiers die in war. Roles and relationships are strictly codified to drive skill and knowledge training and to ensure smooth interoperation with a minimum of acclimatisation time. I think we can be bold and state that such an approach is not required for third- or fourth-year computer programming, even at the better colleges.
I am not saying that we cannot evaluate group work, nor am I saying that I don’t believe such training to be valuable for students entering the workforce. I just don’t happen to accept that mediating the value of a student’s skills and knowledge through their ability to carry out group competencies is either fair or honest. Item 9, where group members may have to adopt a role that they have identified is not optimal, is grossly unfair when final marks depend upon how the group work channel mediates the perception of your contribution.
There is a vast amount of excellent group work analysis and support being carried out right now, in many places. The problem occurs when we try to turn this into a mark that is re-contextualised into the knowledge frame. Your ability to work in groups is a competency and should be clearly identified as such. It may even be a competency that you need to display in order to receive industry-recognised accreditation. No problems with that.
The hallmarks of traditional student group work are resentment at having to do it, fear that either their own contributions won’t be recognised or someone else’s will dominate, and a deep-seated desire to get the process over with.
Some tasks are better suited to group solution. Why don’t we change our evaluation mechanisms to give students the freedom to explore the advantages of the group without the repercussions that we currently have in place? I can provide detailed evaluation to a student on their group role and tell a lot about the team. A student’s inability to work with a randomly selected team on a fake project with artificial timelines doesn’t say anything that I would be happy to allocate a failing grade to. It is, however, an excellent opportunity for discussion and learning, assuming I can get beyond the tyranny of the grade to say it.
You knew it was coming. The biggest challenge of any assessment model: how do we handle group-based assessment?
There’s a joke that says a lot about how students feel when they’re asked to do group work:
When I die I want my group project members to lower me into my grave so they can let me down one more time.
Everyone has horror stories about group work and they tend to fall into these patterns:
- Group members X and Y didn’t do enough of the work.
- I did all of the work.
- We all got the same mark but we didn’t do the same work.
- Person X got more than I did and I did more.
- Person X never even showed up and they still passed!
- We got it all together but Person X handed it in late.
- Person W said that he/she would do task T but never did and I ended up having to do it.
Let’s consolidate these. People are concerned about a fair division of work and fair recognition of effort, especially where this falls into an allocation of grades. (Point 6 only matters if there are late penalties or opportunities lost by not submitting in time.)
This is totally reasonable! If someone is getting recognition for doing a task then let’s make sure it’s the right person and that everyone who contributed gets a guernsey. (Australian football reference to being a recognised team member.)
How do we make group work beautiful? First, we have to define the aesthetics of group work: which characteristics define the activity? Then we maximise those as we have done before to find beauty. But in order for the activity to be both good and true, it has to achieve the goals that define and we have to be open about what we are doing. Let’s start, even before the aesthetics, and ask about group work itself.
What is the point of group work? This varies by discipline but, usually, we take a task that is too large or complex for one person to achieve in the time allowed and that mimics (or is) a task you’d expect graduates to perform. This task is then attacked through some sort of decomposition into smaller pieces, many of which are dependant in a strict order, and these are assigned to group members. By doing this, we usually claim to be providing an authentic workplace or task-focused assignment.
The problem that arises, for me, is when we try and work out how we measure the success of such a group activity. Being able to function in a group has a lot of related theory (psychological, behavioural, and sociological, at least) but we often don’t teach that. We take a discipline task that we believe can be decomposed effectively and we then expect students to carve it up. Now the actual group dynamics will feature in the assessment but we often measure the outputs associate with the task to determine how effective group formation and management was. However, the discipline task has a skill and knowledge dimension, while the group activity elements have a competency focus. What’s more problematic is that unsuccessful group work can overshadow task achievement and lead to a discounting of skill and knowledge success, through mechanisms that are associated but not necessarily correlated.
Going back to competency-based assessment, we assess competency by carrying out direct observation, indirect measures and through professional reports and references. Our group members’ reports on us (and our reports on them) function in the latter area and are useful sources of feedback, identifying group and individual perceptions as well as work progress. But are these inherently markable? We spend a lot of time trying to balance peer feedback, minimise bullying, minimise over-claiming, and get a realistic view of the group through such mechanisms but adding marks to a task does not make it more cognitively beneficial. We know that.
For me, the problem with most group work assessment is that we are looking at the output of the task and competency based artefacts associated with the group and jamming them together as if they mean something.
Much as I argue against late penalties changing the grade you received, which formed a temporal market for knowledge, I’m going to argue against trying to assess group work through marking a final product and then dividing those grades based on reported contributions.
We are measuring different things. You cannot just add red to melon and divide it by four to get a number and, yet, we are combining different areas, with different intentions, and dragging it into one grade that is more likely to foster resentment and negative association with the task. I know that people are making this work, at least to an extent, and that a lot of great work is being done to address this but I wonder if we can channel all of the energy spent in making it work into getting more amazing things done?
Just about every student I’ve spoken to hates group work. Let’s talk about how we can fix that.
The Federal Education Minister, Senator Simon Birmingham, appears as concerned over the disconnect between the Australian Tertiary Admission Rank (ATAR) and university entry as I am. In his own words, students must have “a clear understanding of what they need to do to get into their course of choice and realising what will be expected of them through their further study.”
This article covers some of the issues already raised over transparency and having a system that people work with rather than around.
“While universities determine their own admission requirements, exploring greater transparency measures will ensure that Australian students are provided real information on what they need to do to be admitted to a course at a particular institution and universities are held to account for their public entry requirements,” Senator Birmingham said.
Ensuring that students are ready for university but it’s increasingly obvious that this role is not supported or validated through the current ATAR system, for a large number of students. I look forward to see what comes from the standards panel and I hope that it’s to everyone’s benefit, not just a rejigging of a number that was probably never all that representative to start with.
Perhaps I should confess that I would like a system where any student could get into University (or VET or TAFE or whatever tertiary program they want to) but that we build our preparatory and educational systems to support this happening, rather than just bringing people in to watch them fail. Oh, but a boy can dream.
If you’re watching the Australian media on higher education, you’ll have seen a lot of discussion regarding the validity of the Australian Tertiary Admission Rank (ATAR) as a measure of a student’s future performance and as an accurate summary of the previous years of education.
This article, talking about students being admitted below the cut-offs, contains a lot of discussion on the issue. Not all of the discussion is equally valuable, in my opinion, as the when the question is the validity of the measure, being concerned about ‘standards slipping’ when a lower number is used isn’t that relevant. The interesting parts of the discussion are which mechanisms we should be using and making them transparent so that all students are on a level playing field.
The fact is that students are being admitted to, and passing, courses that have barriers in place which should clearly indicate their chances of success. Yet students are being admitted based on other pathways, using additional measures such as portfolios, and this makes a bit of a mockery on the apparent simplicity of the ATAR system.
My own analysis of student ATAR versus GPA is revelatory: the mapping is a very noisy correlation and, apart from the very highest ATARs, we see people who succeed or fail in a way that does not match their representative ATAR. Yes, there are rough ‘buckets’ but at a granularity of fewer than five buckets, rather than the thousand or so we’re pretending to have.
“Reducing six years of education to a single ranking is simplistic, let’s have a constructive debate about what could replace the ATAR alone as a fairer, more comprehensive and contextual measure of academic potential”.
Iain Martin, from this linked opinion piece.
I couldn’t agree more!