The Illusion of a Number

Rabbit? Duck? Paging Wittgenstein!

I hope you’ve had a chance to read William Rapaport’s paper, which I referred to yesterday. He proposed a great, simple alternative to traditional grading that reduces confusion about what is signalled by ‘grade-type’ feedback, as well as making things easier for students and teachers. Being me, after saying how much I liked it, I then finished by saying “… but I think that there are problems.” His approach was that we could break all grading down into: did nothing, wrong answer, some way to go, pretty much there. And that, I think, is much better than a lot of the nonsense that we pretend we hand out as marks. But, yes, I have some problems.

I note that Rapaport’s exceedingly clear and honest account of what he is doing includes this statement. “Still, there are some subjective calls to make, and you might very well disagree with the way that I have made them.” Therefore, I have license to accept the value of the overall scholarship and the frame of the approach, without having to accept all of the implementation details given in the paper.  Onwards!

I think my biggest concern with the approach given is not in how it works for individual assessment elements. In that area, I think it shines, as it makes clear what has been achieved. A marker can quickly place the work into one of four boxes if there are clear guidelines as to what has to be achieved, without having to worry about one or two percentage points here or there. Because the grade bands are so distinct, as Rapaport notes, it is very hard for the student to make the ‘I only need one more point argument’ that is so clearly indicative as a focus on the grade rather than the learning. (I note that such emphasis is often what we have trained students for, there is no pejorative intention here.) I agree this is consistent and fair, and time-saving (after Walvoord and Anderson), and it avoids curve grading, which I loathe with a passion.

However, my problems start when we are combining a number of these triaged grades into a cumulative mark for an assignment or for a final letter grade, showing progress in the course. Sections 4.3 and 4.4 of the paper detail the implementation of assignments that have triage graded sub-tasks. Now, instead of receiving a “some way to go” for an assignment, we can start getting different scores for sub-tasks. Let’s look at an example from the paper, note 12, to describe programming projects in CS.

  • Problem definition 0,1,2,3
  • Top-down design 0,1,2,3
  • Documented code
    • Code 0,1,2,3
    • Documentation 0,1,2,3
  • Annotated output
    • Output 0,1,2,3
    • Annotations 0,1,2,3

Total possible points = 18

Remember my hypothetical situation from yesterday? I provided an example of two students who managed to score enough marks to pass by knowing the complement of each other’s course knowledge.  Looking at the above example, it appears (although not easily) to be possible for this situation to occur and both students to receive a 9/18, yet for different aspects. But I have some more pressing questions:

  1. Should it be possible for a student to receive full marks for output, if there is no definition, design or code presented?
  2. Can a student receive full marks for everything else if they have no design?

The first question indicates what we already know about task dependencies: if we want to build them into numerical grading, we have to be pedantically specific and provide rules on top of the aggregation mathematics. But, more subtly, by aggregating these measures, we no longer have an ‘accurately triaged’ grade to indicate if the assignment as a whole is acceptable or not. An assignment with no definition, design or code can hardly be considered to be a valid submission, yet good output, documentation and annotation (with no code) will not give us the right result!

The second question is more for those of us who teach programming and it’s a question we all should ask. If a student can get a decent grade for an assignment without submitting a design, then what message are we sending? We are, implicitly, saying that although we talk a lot about design, it’s not something you have to do in order to be successful. Rapaport does go on to talk about weightings and how we can emphasis these issues but we are still faced with an ugly reality that, unless we weight our key aspects to be 50-60% of the final aggregate, students will be able to side-step them and still perform to a passing standard. Every assignment should be doing something useful, modelling the correct approaches, demonstrating correct techniques. How do we capture that?

Now, let me step back and say that I have no problem with identifying the sub-tasks and clearly indicating the level of performance using triage grading, but I disagree with using it for marks. For feedback it is absolutely invaluable: triage grading on sub-tasks will immediately tell you where the majority of students are having trouble, quickly. That then lets you know an area that is more challenging than you thought or one that your students were not prepared for, for some reason. (If every student in the class is struggling with something, the problem is more likely to lie with the teacher.) However, I see three major problems with sub-task aggregation and, thus, with final grade aggregation from assignments.

The first problem is that I think this is the wrong kind of scale to try and aggregate in this way. As Rapaport notes, agreement on clear, linear intervals in grading is never going to be achieved and is, very likely, not even possible. Recall that there are four fundamental types of scale: nominal, ordinal, interval and ratio. The scales in use for triage grading are not interval scales (the intervals aren’t predictable or equidistant) and thus we cannot expect to average them and get sensible results. What we have here are, to my eye, ordinal scales, with no objective distance but a clear ranking of best to worst. The clearest indicator of this is the construction of a B grade for final grading, where no such concept exists in the triage marks for assessing assignment quality. We have created a “some way to go but sometimes nearly perfect” that shouldn’t really exist. Think of it like runners: you win one race and you come third in another. You never actually came second in any race so averaging it makes no sense.

The second problem is that aggregation masks the beauty of triage in terms of identifying if a task has been performed to the pre-determined level. In an ideal world, every area of knowledge that a student is exposed to should be an important contributor to their learning journey. We may have multiple assignments in one area but our assessment mechanism should provide clear opportunities to demonstrate that knowledge. Thus, their achievement of sufficient assignment work to demonstrate their competency in every relevant area of knowledge should be a necessary condition for graduating. When we take triage grading back to an assignment level, we can then look at our assignments grouped by knowledge area and quickly see if a student has some way to go or has achieved the goal. This is not anywhere near as clear when we start aggregating the marks because of the mathematical issues already raised.

Finally, the reduction of triage to mathematical approximation reduces the ability to specify which areas of an assessment are really valuable and, while weighting is a reasonable approximation to this, it is very hard to use a mathematical formula with more and more ‘fudge factors’, a term Rapaport uses, to make up for the fact that this is just a little too fragile.

To summarise, I really like the thrust of this paper. I think what is proposed is far better, even with all of the problems raised above, at giving a reasonable, fair and predictable grade to students. But I think that the clash with existing grading traditions and the implicit requirement to turn everything back into one number is causing problems that have to be addressed. These problems mean that this solution is not, yet, beautiful. But let’s see where we can go.

Tomorrow, I’ll suggest an even more cut-down version of grading and then work on an even trickier problem: late penalties and how they affect grades.


Exploring beauty and aesthetics

One of the first steps I took on this path occurred when I read Hegel’s lectures in aesthetics, and related writings, as he strove to understand the role of fine art. It’s worth noting that what I am referring to as Hegel’s views are reconstructed from what he wrote, what he was recorded to have said, and the interpretation of what he meant, and we must accept that this is not guaranteed to be what he actually thought. With that caveat aside, I can make some statements about Hegelian aesthetics, as they relate to beauty, truth and art.
A portrait of Hegel, staring at the reader. He has intense blue eyes and wispy grey hair.

“Nothing great in the world has ever been accomplished without passion.” Hegel

Hegel saw a clear relationship between fine art, beauty and truth building upon Kant’s reflection that even the declaration of an object’s beauty is an admission of the effect that the object is having upon us. Hegel took this into the realm of the senses, more precisely in my opinion, and sought to find the ideal beauty that was being represented by real art. Because Hegel saw art itself as an expression of spiritual freedom, his view of art was as in the figurative mode, depicting people and real scenes, and thus classical Greek form was particularly pleasing to him. As I’ve noted before, we can extract resonating phrases from the pragmatic situations of philosophers, and I will do so here.
For Hegel, ideal beauty is one where we see the sensuous expression of spiritual freedom; where our senses are engaged, they are consumed by their perception of an aesthetic ‘rightness’ and we see things are they are. Given the Platonic trinity of beauty, truth and goodness, it is not a surprise that classical Greek forms are so immediately pleasing to him! Hegel is rather dismissive of symbolic art, seeing it as a step on the way to real art and requiring both physical intervention and a movement forward to the figurative form.
At this point, I’d like to leave Hegel’s fixed point in time, given the changing opinion of the role and importance of symbolic forms in art, and because Hegel sets up a tension between the natural and the spiritual that is key to him arriving back at his fixation on the perfection of Greek art. It’s fascinating (and Hegel’s lectures on aesthetics are delightful) but it’s not my focus.
The sensuous expression of spiritual freedom can be interpreted in many ways. Sensuous, as a word, often has sexual overtones but it is not what is meant here. It simply means something that relates to or affects the senses rather than the intellect. This leads us immediately to aesthetics, whether as the appreciation of beauty or as the set of principles that we often use to describe art. Now, through aesthetics, we link the sensuous back to the intellect and we can see, more clearly, the way that beauty can drive us towards certain thoughts, as Plato espoused. Hegel’s idea of what constituted ideal sensuous expression began and ended with Greek gods, sculptures and all of those forms. His aesthetics could probably not have accommodated something such as Cubism, except as a step towards ‘real art’. He could appreciate it having some aesthetically pleasing characteristics but it was not really art. There is a lesson for us here in determining what constitutes “beautiful education”.
Hegel had no doubt that his aesthetics were sound, despite his views obliterating the value of almost any art in the previous millennium. Many of us feel equally strongly about the way that we teach and I have begun to believe that this conflict of principle is what often causes us to put aside even the strongest evidence, where it would lead us to abandon what we see as being beautiful, for something that we suspect or fear will be ‘ugly’. We also see a similar “shock of the new” in education as we did in art. Let us not forget that it was only 1905 when wild brush strokes and brash colour palettes saw a group of artists labelled as “Wild Beasts!” (Fauvism). But, as I noted earlier, the “fear of the old” is as contaminating a world view as the “shock of the new”. We should not confuse our personal comfort with a form of expression with it being the best that can be achieved. We should not assume that personal discomfort is a reliable indication of positive progress.
I gave a talk recently, which I will record and make available shortly, where I argued that many of our problems in education stem from what is, ultimately, an aesthetic distinction over which characteristics make up good teaching. It is entirely possible for two people with different, or even conflicting, views of the characteristics of good teaching to both be rightly convinced of their status of “good teachers”. I reduced good educational practice to three elements, after a model employed by Suits in “The Grasshopper”:
  1. the ability to state the goal of any educational activity as separate from the activity,
  2. the awareness of evidence-based practice and its use in everyday teaching, and
  3. a willingness to accept that it is correct goal setting and using  techniques that work, and can be shown to work, that will lead to better outcomes.
(Suits’ motivation was the refutation of Wittgenstein’s thoughts on the inability to define what a game is. The Grasshopper is a delightful book, whether you are convinced by the argument or not.)
This is a very short summary and I’ll write more on this but I mention it because these guidelines are effectively devoid of aesthetics, yet they are meaningless without having some aesthetically guided practice to act upon. They were chosen specifically because they were hard to argue against, as they are really a summary of how we conduct many actions as humans.
Is there an ideal form for education? Can it be, as Hegel did, put to one side while we discuss the vessels that carry it into the sensual realm? Are we capable of agreeing on that before we start ranking the characteristics that we can come up with?
I think it’s going to be an interesting year while we discuss it!