It has been a tough year. Australia was on fire at the start, then sabre rattling started, then COVID came, and now the US is in turmoil as black voices rise up to demand justice and fair and equitable treatment as citizens. (Black Lives Matter. If that bothers you, go read someone else, or, better yet, educate yourself as to why you should agree.)
The COVID crisis has had a large impact on the educational sector, affecting enrolments at many Australian institutions as we are (to varying degrees) dependent upon international students for income. But now, many of our international students are not coming, which means that every University in Australia is taking a hit. At the same time, the COVID issues that prevent students from entering the country have forced us into an unexpected and unprecedented level of remote and on-line teaching for every student. We have been in remote mode for months now.
Many far more respected voices than me have correctly identified that we cannot learn a great deal about remote learning from this change, because it was not planned, it has no “before” state that we captured for a control, and it is a scrabbling matter of survival. We have gone from the relatively ample sustenance of Universities in the 1960s and ’70s, to a more constrained budget as funding changed, and now we are on survival rations.
Our pedagogies are also rationed, limited by the physical space we can occupy, the technologies that we have, the staff who are available, and an overwhelming sense of dread that fills the spaces in June as many of us think “What next?”
Rationing reduces both the quantity and range of what we consume. From a food perspective, history tells us that limited sustenance is dangerous in two ways: firstly because slow starvation is still starvation, and secondly, that there is a minimum requirement for a balanced diet or humans can get very sick or even die with full bellies. The consumption of maize, a staple of Mesoamerica, can easily lead to pellagra and other deficiency diseases unless it is nixtamalized with lye or lime. Where maize went without this knowledge, outbreaks of deficiency disease followed. It’s not just cereals and vegetables that have this problem: rabbit meat is so low in fat that a diet exclusively on this meat can lead to protein poisoning and, if rabbit is your only meat source, common advice is not to make it a substantial part of your diet.
Back to our pedagogies, while we are forced to ration our approaches and our resources, we have to think about whether we are providing enough for a balanced and sufficient education, or are we slowly starving our students or, worse still, introducing educational deficiencies that will hamper their development in the future?
There will be weeks and months of analysis after this challenging year, and many assumptions will be challenged. We will see the impact of these remote terms and semesters on education, on knowledge, on community, and on identity. But that analysis is after things improve. Right now, our focus is on monitoring the health of our communities, looking for slow decline and deficiency as best we can, and improving things where we can.
How long do we stay on these rations? 2020 is a brewing storm of new things, each one pushing a previous event into the background. Every lightning bolt is brighter and closer than before, every thunder clap louder.
If this were a storm at sea, we would be desperately trying to ensure our ship was sound, that it could stay afloat, and we would look for signs of the storm breaking.
When a ship is in distress, often its weight is reduced to improve its chances of staying afloat. The things thrown off are known as “jetsam”, distinguished from those things that float away (either from waves or because the ship has foundered), which are “flotsam”. What have we thrown from our ships, or at least considered?
There are no more face-to-face lectures in many cases. These are replaced with recordings or on-line presentation and discussion. There are fewer tutorials, with fewer people, as the tyranny of the physical prevents us from filling rooms while we are under disease management social distancing restrictions. We do not exchange paper. We do not gather in laboratories. We do not sit in one place to undertake examinations under strict invigilation conditions.
The traditional lecture, with hundreds of students sitting in a room to receive the wisdom from the front, is jetsam. At my institution, there will be no face-to-face lectures until 2021. This is mostly because we will not be legally allowed to put students into many of the lecture theatres at densities and numbers that make it feasible. Existing laws would require us to have 10 times the lecture space – which is an impossible requirement, even if we had the lecturing staff available to multiply their effort by 10. But the presentation of information, interactively with the lecturer, is still going strong in the remote space and we have noticed that more students participate in Q&A than the few we used to see dominating the physical space. We have lost the “vitamin” of community that occurs through regular mingling but that may come from other sources.
Paper assignments, long dwindling, are jetsam but that is perhaps hastening an inevitable demise. In a time of growing part-time student numbers, students who work part-time, increasing transit times, requiring the physical transfer of cellulose fibres imprinted with marking reagent seems a little excessive unless absolutely necessary. It’s true that on-line and electronic systems are less flexible than paper and, especially for formulae, there is a steep learning curve for formatting tools to represent complex symbols. E-paper, in its various forms, is promising but still not there. The removal of paper is probably making things harder and stifling some students’ creativity.
The tutorials and the laboratories are coming back, under new regulations and new requirements. Their value, the authentic and hands-on nature of a good exercise in these spaces, saved them from the ocean. We know what is good about them but now have to make sure that we have placed that good at the forefront.
The invigilated paper examination is another case altogether. I have just finished working on a fully remote examination, open book, and presented across a network. Instead of having two pens and pencils, my students need a fully-charged battery and a good internet connection. But the move to open book (a first for this course) has meant a Bloomian shift up into application and evaluation as a minimum – a very positive direction that we had been making but was much more easily justified in this change. We have kept most of the exam but we have thrown out some of its old baggage.
I will be honest. I think that on-line examinations, already a busy area of research, are going to be an area of a great deal of future research, much of it looking back into this year as we desperately try to work out what worked and how it worked. For me, this rationing has been fascinating, as it forced me to think in detail about exactly what I wanted students to do, as their potential identities as graduates, as students, and as discipline specialists.
There is another nautical term, which you might not know, lagan, that refers to heavy goods thrown from a ship to reduce weight but marked with a buoy to be recovered later. When danger has passed, you circle back and get them again. While flotsam and jetsam are often legally passed to their discoverer, unless the former owner makes a claim, lagan is always yours and you will be back for it.
I do wonder how many of the things that we didn’t do, that went overboard, are considered to be so valuable that we circle back for them? As we come out of this, even while we’re circling back, it’s probably worth some moments in reflection to determine whether we really want that heavy thing back on board or we learned something new while we weathered the storm.
Stay safe, stay well.
[Edit: The conference is now being held in Hong Kong. I don’t know the reason behind the change but the original issue has been addressed. I have been accepted to Learning @ Scale so will not be able to attend anyway, as it turns out, as the two conferences overlap by two days and even I can’t be in the US and Hong Kong at the same time.]
There is a large amount of discussion in the CS Ed community right now over the LATICE 2017 conference, which is going to be held in a place where many members of the community will be effectively reduced to second-class citizenship and placed under laws that would allow them to be punished for the way that they live their lives. This affected group includes women and people who identify with QUILTBAG (“Queer/Questioning, Undecided, Intersex, Lesbian, Trans (Transgender/Transsexual), Bisexual, Asexual, Gay”). Conferences should be welcoming. This is not a welcoming place for a large percentage of the CS Ed community.
There are many things I could say here but what I would prefer you to do is to look at who is commenting on this and then understand those responses in the context of the author. For once, it matters who said what, because not everyone will be as affected by the decision to host this conference where it is.
From what I’ve seen, a lot of men think this is a great opportunity to do some outreach. A lot has been written, predominantly by men, about how every place has its problems and so on and so forth.
But let’s look at other voices. The female and QUILTBAG voices do not appear to share this support. Asking for their rights to be temporarily reduced or suspended for this ‘amazing opportunity’ is too much to ask. In response, I’ve seen classic diminishment of genuine issues that are far too familiar. Concerns over the reductions of rights are referred to as ‘comfort zone’ issues. This is pretty familiar to anyone who is actually tracking the maltreatment and reduction of non-male voices over time. You may as well say “Stop being so hysterical” and at least be honest and own your sexism.
Please go and read through all of the comments and see who is saying what. I know what my view of this looks like, as it is quite clear that the men who are not affected by this are very comfortable with such a bold quest and the people who would actually be affected are far less comfortable.
This is not a simple matter of how many people said X or Y, it’s about how much discomfort one group has to suffer that we take their concerns seriously. Once again, it appears that we are asking a group of “not-men”, in a reductive sense, to endure more and I cannot be part of that. I cannot condone it.
I will not be going. I will have to work out if I can cite this conference, given that I can see that it will lead to discrimination and a reduction of participation over gender and sexuality lines, unintentionally or not. I have genuine ethical concerns about using this research that I would usually reserve for historical research. But that is for me to worry about. I have to think about my ongoing commitment to this community.
But you shouldn’t care what I think. Go and read what the people who will be affected by this think. For once, please try to ignore what a bunch of vocal guys want to tell you about how non-male groups should be feeling.
Extract of a letter from John Adams to Abigail Adams, posted 12 May 1780, from Paris.
I could fill Volumes with Descriptions of Temples and Palaces, Paintings, Sculptures, Tapestry, Porcelaine, &c. &c. &c. — if I could have time. But I could not do this without neglecting my duty. The Science of Government it is my Duty to study, more than all other Sciences: the Art of Legislation and Administration and Negotiation, ought to take Place, indeed to exclude in a manner all other Arts. I must study Politicks and War that my sons may have liberty to study Mathematicks and Philosophy. My sons ought to study Mathematicks and Philosophy, Geography, natural History, Naval Architecture, navigation, Commerce and Agriculture, in order to give their Children a right to study Painting, Poetry, Musick, Architecture, Statuary, Tapestry and Porcelaine.
I’ve written a lot in the past month and a half. Now, because I’m committed to evaluation, I have to look back at all of it and think about some difficult matters:
- Is anyone reading this?
- Are the people reading this the ones who can make change?
- Is the best way to do this?
- Should I be doing something else?
There are roughly 1,000 people who see my posts, between direct subscribers who read in e-mail, Facebook and the elusive following community on Twitter.
Twitter shouldn’t count, as I know from direct experience that the click-through rate from Twitter is tiny. (My posts have been shared by people with 5-10,000 followers and it has turned into maybe 10-20 more people reading.) Now I’m down to maybe 4-500 readers.
Facebook shares a longer fragment of my ideas but the click through is still small. Perhaps this brings me down to the roughly 200 followers I have, who have (over time) contributed about 1,000 ‘Likes’. However, almost all of these positive reinforcements stem from a different phase of the blog, a time when I was blogging conferences and being useful, rather than pontificating on the nature of beauty. My readership used to be 100 people a day, or more. I can’t crack 80 today and the way that I’m blogging is unlikely to reach that larger audience, yet it’s what I want to do.
The answer to 1 is that a few other people a day are reading what I write. I’d put it as high as twenty on a good day but most days it’s under ten.
2’s a tricky question. We can all make change; that’s one of my firmest beliefs. However, there is making change and then there are change makers. I know several people in this area quite well and they read me occasionally but it’s not something that they dedicate time to do. I have people that I always read but I can’t make the changes they need. It’s frustrating. No doubt, my ideas appeal to some people but change takes will and capacity to change, not just a sympathetic ear. I don’t want people to read this and feel trapped because they can’t make change. The answer to 2 is, probably, ‘no’.
3 follows from 1 and 2. If my readership is small and my ideas have little influence then this is not the best way to do things. We face enormous challenges. We need effective mechanisms for sharing information. If I am to make change, I have to invest my time wisely. I am not a large-scale player or a change maker. I need help to do it and if that help isn’t coming from this avenue, I have to choose another.
4 is easier. I can focus on my scholarship, practice, and research, rededicating the time I’ve been spending on this blog. People read papers where they don’t read blogs. Papers drive recognition. Recognition gets you the places to speak where your voice can be heard. There is no point having written all those words in a blog if it’s rarely read. This has been a highly rewarding experience in many ways but you have to wonder why you’re doing it if very few people read it or remember what you’ve written.
I wanted people to think and to talk about the ideas shared here. For those of you who have let me know that this worked, my thanks!
I’m tempted to keep going with the daily blog but the aesthetic argument traps me here. Spending time on something that isn’t working and insisting that it’s valuable is self-deception. Investing energy into an avenue that isn’t achieving your goals isn’t good. I cannot deprive my students of the hour or so a day that I’ve been spending doing this unless I achieve more for them than I would by doing some other aspect of my job.
Students and teachers: the true focus of any aesthetic discussion of education; the most important aspects of any discussion of what we should be doing because they are people and not just machine parts. As for us, so for them.
There are more discussions to be had but they’ll show up in more formal places, most likely. I’m always happy to talk to people about ideas at conferences. I’ve already started a face-to-face discussion about taking some of these ideas further in a more traditional research sense and I’m very excited about that.
But perhaps it’s time to let this blog go, listen to the numbers, reflect on the dissemination of knowledge, and accept that I would not be following my own advice if I were to continue. I love the beauty argument. I think it’s great. I stand by everything I’ve written this year. I just don’t think that this is the way to move people towards that agenda.
Thus, the daily updates stop with this post. I’ll still post things that interest me but there’ll be fewer of them.
I’ll leave you with the message I wanted to get across this year:
- Educational philosophy is full of the aesthetics of education. Dewey and Bloom just scratch the surface of this. The late 19th and early 20th century were an incredible time of upheaval and we still haven’t addressed many of the questions raised then. To the libraries!
- Fair, equitable, well-designed and evidence-based education is at the core of any beautiful system.
- Every day, we should ask ourselves if what we are doing is beautiful, good or true, taking into account all of the difficult questions of how we balance necessities against desirabilities, being honest about which is which. If we aren’t managing this, we need to either seek to change or accept that what we are doing isn’t right.
- We should leave enough time for ourselves in all of this, as there should be no sacrificial element to beautiful education.
- Change is coming. Change is here. Pretending that it won’t happen isn’t beautiful.
I hope that you all have a fantastic learning and teaching year, with many amazing and beautiful moments and outcomes!
This year, I hope to be at several conferences and I look forward to talking to anyone about the ideas in this phase (or any other phase) of the blog.
Have a great year!
You knew it was coming. The biggest challenge of any assessment model: how do we handle group-based assessment?
There’s a joke that says a lot about how students feel when they’re asked to do group work:
When I die I want my group project members to lower me into my grave so they can let me down one more time.
Everyone has horror stories about group work and they tend to fall into these patterns:
- Group members X and Y didn’t do enough of the work.
- I did all of the work.
- We all got the same mark but we didn’t do the same work.
- Person X got more than I did and I did more.
- Person X never even showed up and they still passed!
- We got it all together but Person X handed it in late.
- Person W said that he/she would do task T but never did and I ended up having to do it.
Let’s consolidate these. People are concerned about a fair division of work and fair recognition of effort, especially where this falls into an allocation of grades. (Point 6 only matters if there are late penalties or opportunities lost by not submitting in time.)
This is totally reasonable! If someone is getting recognition for doing a task then let’s make sure it’s the right person and that everyone who contributed gets a guernsey. (Australian football reference to being a recognised team member.)
How do we make group work beautiful? First, we have to define the aesthetics of group work: which characteristics define the activity? Then we maximise those as we have done before to find beauty. But in order for the activity to be both good and true, it has to achieve the goals that define and we have to be open about what we are doing. Let’s start, even before the aesthetics, and ask about group work itself.
What is the point of group work? This varies by discipline but, usually, we take a task that is too large or complex for one person to achieve in the time allowed and that mimics (or is) a task you’d expect graduates to perform. This task is then attacked through some sort of decomposition into smaller pieces, many of which are dependant in a strict order, and these are assigned to group members. By doing this, we usually claim to be providing an authentic workplace or task-focused assignment.
The problem that arises, for me, is when we try and work out how we measure the success of such a group activity. Being able to function in a group has a lot of related theory (psychological, behavioural, and sociological, at least) but we often don’t teach that. We take a discipline task that we believe can be decomposed effectively and we then expect students to carve it up. Now the actual group dynamics will feature in the assessment but we often measure the outputs associate with the task to determine how effective group formation and management was. However, the discipline task has a skill and knowledge dimension, while the group activity elements have a competency focus. What’s more problematic is that unsuccessful group work can overshadow task achievement and lead to a discounting of skill and knowledge success, through mechanisms that are associated but not necessarily correlated.
Going back to competency-based assessment, we assess competency by carrying out direct observation, indirect measures and through professional reports and references. Our group members’ reports on us (and our reports on them) function in the latter area and are useful sources of feedback, identifying group and individual perceptions as well as work progress. But are these inherently markable? We spend a lot of time trying to balance peer feedback, minimise bullying, minimise over-claiming, and get a realistic view of the group through such mechanisms but adding marks to a task does not make it more cognitively beneficial. We know that.
For me, the problem with most group work assessment is that we are looking at the output of the task and competency based artefacts associated with the group and jamming them together as if they mean something.
Much as I argue against late penalties changing the grade you received, which formed a temporal market for knowledge, I’m going to argue against trying to assess group work through marking a final product and then dividing those grades based on reported contributions.
We are measuring different things. You cannot just add red to melon and divide it by four to get a number and, yet, we are combining different areas, with different intentions, and dragging it into one grade that is more likely to foster resentment and negative association with the task. I know that people are making this work, at least to an extent, and that a lot of great work is being done to address this but I wonder if we can channel all of the energy spent in making it work into getting more amazing things done?
Just about every student I’ve spoken to hates group work. Let’s talk about how we can fix that.
But [GPA calculation adjustment] have to be a method of avoidance, this can be a useful focusing device. If a student did really well in, say, Software Engineering but struggled with an earlier, unrelated, stream, why can’t we construct a GPA for Software Engineering that clearly states the area of relevance and degree of information? Isn’t that actually what employers and people interested in SE want to know?
This hits at the heart of my concerns over any kind of summary calculation that obscures the process. Who does this benefit? What use it is to anyone? What does it mean? Let’s look at one of the most obvious consumers of student GPAs: the employers and industry.
Feedback from the Australian industry tells us that employers are generally happy with the technical skills that we’re providing but it’s the softer skills (interpersonal skills, leadership, management abilities) that they would like to see more of and know more about. A general GPA doesn’t tell you this but a Software Engineering focused GPA (as I mentioned above) would show you how a student performed in courses where we would expect to see these skills introduced and exercised.
Putting everything into one transcript gives people the power to assemble this themselves, yes, but this requires the assembler to know what everything means. Most employers have neither the time nor inclination to do this for all 39 or so institutions in Australia. But if a University were to say “this is a summary of performance in these graduate attributes”, where the GAs are regularly focused on the softer skills, then we start to make something more meaningful out of an arbitrary number.
But let’s go further. If we can see individual assessments, rather than coarse subject grades, we can start to construct a model of an individual across the different challenges that they have faced and overcome. Portfolios are, of course, a great way to do this but they’re more work to read than single measures and, too often, such a portfolio is weighed against simpler, apparently meaningful measures such as high GPAs and found wanting. Portfolios also struggle if placed into a context of previous failure, even if recent activity clearly demonstrates that a student has moved on from that troubled or difficult time.
I have a deep ethical and philosophical objection to curve grading, as you probably know. The reason is simple: the actions of one student should not negatively affect the outcomes of another. This same objection is my biggest problem with GPA, although in this case the action and outcomes belong to the same student at different points in her or his life. Rather than using performance in one course to determine access to the learning upon which it depends, we make these grades a permanent effect and every grade that comes afterwards is implicitly mediated through this action.
Should Past Academic Nick have an inescapable impact on Now and Future Academic Nick’s life? When we look at all of the external influences on success, which make it clear how much totally non-academic things matter, it gets harder and harder to say “Yes, Past Academic Nick is inescapable.” Unfairness is rarely aesthetically pleasing.
An excellent comment on the previous post raised the issue of comparing GPAs in an environment where the higher GPA included some fails but the slightly lower GPA student had always passed. Which was the ‘best’ student from an award perspective? Student A fails three courses at the start of his degree, student B fails three courses at the end. Both pass with the same GPA, time to completion, and number of passes and fails. Is there even a sense of ‘better student’ here? B’s struggles are more immediate and, implicitly, concerns would be raised that these problems could still be active. A has, apparently, moved on in some way. But we’d never know this from simplistic calculations.
If we’re struggling to define ‘best’ and we’re not actually providing something that many people feel is useful, while burdening students with an inescapable past, then the least we can do is to sit down with the people who are affected by this and ask them what they really want.
And then, when they tell us, we do something about changing our systems.
If we are going to try and summarise a complicated, long-term process with a single number, and I don’t see such shortcuts going away anytime soon, then it helps to know:
- Exactly what the number represents.
- How it can be used.
- What the processes are that go into its construction.
We have conventions as to what things mean but, when we want to be precise, we have to be careful about our definition and our usage of the final value. As a simple example, one thing that often surprises people who are new to numerical analysis is that there is more than one way of calculating the average value of a group of numbers.
While average in colloquial language would usually mean that we take the sum of all of the numbers and divide them by their count, this is more formally referred to as the arithmetic mean. What we usually want from the average is some indication of what the typical value for this group would be. If you weigh ten bags of wheat and the average weight is 10 kilograms, then that’s what many people would expect the weight to be for future bags, unless there was clear early evidence of high variation (some 500g, some 20 kilograms, for example.)
But the mean is only one way to measure central tendency in a group of numbers. We can also measure the median, the number that separates the highest half of the data from the lowest, or the mode, the value that is the most frequently occurring value in the group.
(This doesn’t even get into the situation where we decide to aggregate the values in a different way.)
If you’ve got ten bags of wheat and nine have 10 kilograms in there, but one has only 5 kilograms, which of these ways of calculating the average is the one you want? The mode is 10kg but the mean is 9.5kg. If you tried to distribute the bags based on the expectation that everyone gets 9.5, you’re going to make nine people very happy and one person unhappy.
Most Grade Point Average calculations are based on a simple arithmetic mean of all available grades, with points allocated from 0 to an upper bound based on the grade performance. As a student adds more courses, these contributions are added to the calculation.
In yesterday’s post, I mused on letting students control which grades go into a GPA calculation and, to explore that, I now have to explain what I mean and why that would change things.
As it stands, because a GPA is an average across all courses, any lower grades will permanently drop the GPA contribution of any higher grades. If a student gets a 7 (A+ or High Distinction) for 71 of her courses and then a single 4 (a Passing grade) for one, her GPA will be 6.875. It can never return to 7. The clear performance band of this student is at the highest level, given that just under 99% of her marks are at the highest level, yet the inclusion of all grades means that a single underperformance, for whatever reason, in three years has cost her standing for those people who care about this figure.
My partner and I discussed some possible approaches to GPA that would be better and, by better, we mean approaches that encourage students to improve, that clearly show what the GPA figure means, and that are much fairer to the student. There are too many external factors contributing to resilience and high performance for me to be 100% comfortable with the questionable representation provided by the GPA.
Before we even think about student control over what is presented, we can easily think of several ways to make a GPA reflect what you have achieved, rather than what you have survived.
- We could only count a percentage of the courses for each student. Even having 90% counted means that students who stumble a little once or twice do not have this permanently etched into a dragging grade.
- We could allow a future attempt at a course with an improved to replace the previous grade. Before we get too caught up in the possibility of ‘gaming’, remember that students would have to pay for this (even if delayed) in most systems and it will add years to their degree. If a student can reach achievement level X in a course then it’s up to us to make sure that does correspond to the achievement level!
- We could only count passes. Given that a student has to assemble sufficient passing grades to be awarded a degree, why then would we include the courses that do not count in a calculation of GPA?
- We could use the mode and report the most common mark the student receives.
- We could do away with it totally. (Not going to happen any time soon.)
- We could pair the GPA with a statistical accompaniment that tells the viewer how indicative it is.
Options 1 and 2 are fairly straight-forward. Option 3 is interesting because it compresses the measurement band to a range of (in my system) 4-7 and this then implicitly recognises that GPA measures for students who graduate are more likely to be in this tighter range: we don’t actually have the degree of separation that we’d assume from a range of 0-7. Option 4 is an interesting way to think about the problem: which grade is the student most likely to achieve, across everything? Option 5 is there for completeness but that’s another post.
Option 6 introduces the idea that we stop GPA being a number and we carefully and accurately contextualise it. A student who receives all high distinctions in first semester still has a number of known hurdles to get over. The GPA of 7 that would be present now is not as clear an indicator of facility with the academic system as a GPA of 7 at the end of a degree, whichever other GPA adjustment systems are in play.
More evidence makes it clearer what is happening. If we can accompany a GPA (or similar measure) with evidence, then we are starting to make the process apparent and we make the number mean something. However, this also allows us to let students control what goes into their calculation, from the grades that they have, as a clear measure of the relevance of that measure can be associated.
But this doesn’t have to be a method of avoidance, this can be a useful focusing device. If a student did really well in, say, Software Engineering but struggled with an earlier, unrelated, stream, why can’t we construct a GPA for Software Engineering that clearly states the area of relevance and degree of information? Isn’t that actually what employers and people interested in SE want to know?
Handing over an academic transcript seems to allow anyone to do this but human cognitive biases are powerful, subtle and pervasive. It is harder for most humans to recognise positive progress in the areas that they are interested in, if there is evidence of less stellar performance elsewhere. I cite my usual non-academic example: Everyone thought Anthony La Paglia’s American accent was too fake until he stopped telling people he was Australian.
If we have to use numbers like this, then let us think carefully about what they mean and, if they don’t mean that much, then let’s either get rid of them or make them meaningful. These should, at a fundamental level, be useful to the students first, us second.
Yesterday, I wrote:
We need assessment systems that work for the student first and everyone else second.
Assessments support evaluation, criticism and ranking (Wolff). That’s what it does and, in many cases, that also constitutes a lot of why we do it. But who are we doing it for?
I’ve reflected on the dual nature of evaluation, showing a student her or his level of progress and mastery while also telling us how well the learning environment is working. In my argument to reduce numerical grades to something meaningful, I’ve asked what the actual requirement is for our students, how we measure mastery and how we can build systems to provide this.
But who are the student’s grades actually for?
In terms of ranking, grades allow people who are not the student to place the students in some order. By doing this, we can award awards to students who are in the awarding an award band (repeated word use deliberate). We can restrict our job interviews to students who are summa cum laude or valedictorian or Dean’s Merit Award Winner. Certain groups of students, not all, like to define their progress through comparison so there is a degree of self-ranking but, for the most part, ranking is something that happens to students.
Criticism, in terms of providing constructive, timely feedback to assist the student, is weakly linked to any grading system. Giving someone a Fail grade isn’t a critique as it contains no clear identification of the problems. The clear identification of problems may not constitute a fail. Often these correlate but it’s weak. A student’s grades are not going to provide useful critique to the student by themselves. These grades are to allow us to work out if the student has met our assessment mechanisms to a point where they can count this course as a pre-requisite or can be awarded a degree. (Award!)
Evaluation is, as noted, useful to us and the student but a grade by itself does not contain enough record of process to be useful in evaluating how mastery goals were met and how the learning environment succeeded or failed. Competency, when applied systematically, does have a well-defined meaning. A passing grade does not although there is an implied competency and there is a loose correlation with achievement.
Grades allow us to look at all of a student’s work as if this one impression is a reflection of the student’s involvement, engagement, study, mistakes, triumphs, hopes and dreams. They are additions to a record from which we attempt to reconstruct a living, whole being.
Grades are the fossils of evaluation.
Grades provide a mechanism for us, in a proxy role as academic archaeologist, to classify students into different groups, in an attempt to project colour into grey stone, to try and understand the ecosystem that such a creature would live in, and to identify how successful this species was.
As someone who has been a student several times in my life, I’m aware that I have a fossil record that is not traditional for an academic. I was lucky to be able to place a new imprint in the record, to obscure my history as a much less successful species, and could then build upon it until I became an ACADEMIC TYRANNOSAURUS.
But I’m lucky. I’m privileged. I had a level of schooling and parental influence that provided me with an excellent vocabulary and high social mobility. I live in a safe city. I have a supportive partner. And, more importantly, at a crucial moment in my life, someone who knew me told me about an opportunity that I was able to pursue despite the grades that I had set in stone. A chance came my way that I never would have thought of because I had internalised my grades as my worth.
Let’s look at the fossil record of Nick.
My original GPA fossil, encompassing everything that went wrong and right in my first degree, was 2.9. On a scale of 7, which is how we measure it, that’s well below a pass average. I’m sharing that because I want you to put that fact together with what happened next. Four years later, I started a Masters program that I finished with a GPA of 6.4. A few years after the masters, I decided to go and study wine making. That degree was 6.43. Then I received a PhD, with commendation, that is equivalent to GPA 7. (We don’t actually use GPA in research degrees. Hmmm.) If my grade record alone lobbed onto your desk you would see the desiccated and dead snapshot of how I (failed to) engage with the University system. A lot of that is on me but, amazingly, it appears that much better things were possible. That original grade record stopped me from getting interviews. Stopped me from getting jobs. When I was finally able to demonstrate the skills that I had, which weren’t bad, I was able to get work. Then I had the opportunity to rewrite my historical record.
Yes, this is personal for me. But it’s not about me because I wasn’t trapped by this. I was lucky as well as privileged. I can’t emphasise that enough. The fact that you are reading this is due to luck. That’s not a good enough mechanism.
Too many students don’t have this opportunity. That impression in the wet mud of their school life will harden into a stone straitjacket from which they may never escape. The way we measure and record grades has far too much potential to work against students and the correlation with actual ability is there but it’s not strong and it’s not always reliable.
The student you are about to send out with a GPA of 2.9 may be competent and they are, most definitely, more than that number.
The recording of grades is a high-loss storage record of the student’s learning and pathway to mastery. It allows us to conceal achievement and failure alike in the accumulation of mathematical aggregates that proxy for competence but correlate weakly.
We need assessment systems that work for the student first and everyone else second.
From the previous post, I asked how many times a student has to perform a certain task, and to which standard, that we become confident that they can reliably perform the task. In the Vocational Education and Training world this is referred to as competence and this is defined (here, from the Western Australian documentation) as:
In VET, individuals are considered competent when they are able to consistently apply their knowledge and skills to the standard of performance required in the workplace.
How do we know if someone has reached that level of competency?
We know whether an individual is competent after they have completed an assessment that verifies that all aspects of the unit of competency are held and can be applied in an industry context.
The programs involved are made up of units that span the essential knowledge and are assessed through direct observation, indirect measurements (such as examination) and in talking to employers or getting references. (And we have to be careful that we are directly measuring what we think we are!)Hang on. Examinations are an indirect measurement? Yes, of course they are here, we’re looking for the ability to apply this and that requires doing rather than talking about what you would do. Your ability to perform the task in direct observation is related to how you can present that knowledge in another frame but it’s not going to be 1:1 because we’re looking at issues of different modes and mediation.
But it’s not enough just to do these tasks as you like, the specification is quite clear in this:
It can be demonstrated consistently over time, and covers a sufficient range of experiences (including those in simulated or institutional environments).
I’m sure that some of you are now howling that many of the things that we teach at University are not just something that you do, there’s a deeper mode of thinking or something innately non-Vocational about what is going on.
And, for some of you, that’s true. Any of you who are asking students to do anything in the bottom range of Bloom’s taxonomy… I’m not convinced. Right now, many assessments of concepts that we like to think of as abstract are so heavily grounded in the necessities of assessment that they become equivalent to competency-based training outcomes.
The goal may be to understand Dijkstra’s algorithm but the task is to write a piece of code that solves the algorithm for certain inputs, under certain conditions. This is, implicitly, a programming competency task and one that must be achieved before you can demonstrate any ability to show your understanding of the algorithm. But the evaluator’s perspective of Dijkstra is mediated through your programming ability, which means that this assessment is a direct measure of programming ability in language X but an indirect measure of Dijkstra. Your ability to apply Dijkstra’s algorithm would, in a competency-based frame, be located in a variety of work-related activities that could verify your ability to perform the task reliably.
All of my statistical arguments on certainty from the last post come back to a simple concept: do I have the confidence that the student can reliably perform the task under evaluation? But we add to this the following: Am I carrying out enough direct observation of the task in question to be able to make a reliable claim on this as an evaluator?
There is obvious tension, at modern Universities, between what we see as educational and what we see as vocational. Given that some of what we do falls into “workplace skills” in a real sense, although we may wish to be snooty about the workplace, why are we not using the established approaches that allow us to actually say “This student can function as an X when they leave here?”
If we want to say that we are concerned with a more abstract education, perhaps we should be teaching, assessing and talking about our students very, very differently. Especially to employers.