Teaching for (current) Humans
Posted: January 13, 2016 Filed under: Education, Opinion | Tags: advocacy, authenticity, briana morrison, community, design, education, educational research, edx, ethics, higher education, in the student's head, lauren margulieux, learning, mark guzdial, moocs, on-line learning, principles of design, reflection, resources, student perspective, subgoals, teaching, teaching approaches, technology, thinking, tools, video 5 Comments
Leonardo’s experiments in human-octopus engineering never received appropriate recognition.
I was recently at a conference-like event where someone stood up and talked about video lectures. And these lectures were about 40 minutes long.
Over several million viewing sessions, EdX have clearly shown that watchable video length tops out at just over 6 minutes. And that’s the same for certificate-earning students and the people who have enrolled for fun. At 9 minutes, students are watching for fewer than 6 minutes. At the 40 minute mark, it’s 3-4 minutes.
I raised this point to the speaker because I like the idea that, if we do on-line it should be good on-line, and I got a response that was basically “Yes, I know that but I think the students should be watching these anyway.” Um. Six minutes is the limit but, hey, students, sit there for this time anyway.
We have never been able to unobtrusively measure certain student activities as well as we can today. I admit that it’s hard to measure actual attention by looking at video activity time but it’s also hard to measure activity by watching students in a lecture theatre. When we add clickers to measure lecture activity, we change the activity and, unsurprisingly, clicker-based assessment of lecture attentiveness gives us different numbers to observation of note-taking. We can monitor video activity by watching what the student actually does and pausing/stopping a video is a very clear signal of “I’m done”. The fact that students are less likely to watch as far on longer videos is a pretty interesting one because it implies that students will hold on for a while if the end is in sight.
In a lecture, we think students fade after about 15-20 minutes but, because of physical implications, peer pressure, politeness and inertia, we don’t know how many students have silently switched off before that because very few will just get up and leave. That 6 minute figure may be the true measure of how long a human will remain engaged in this kind of task when there is no active component and we are asking them to process or retain complex cognitive content. (Speculation, here, as I’m still reading into one of these areas but you see where I’m going.) We know that cognitive load is a complicated thing and that identifying subgoals of learning makes a difference in cognitive load (Morrison, Margulieux, Guzdial) but, in so many cases, this isn’t what is happening in those long videos, they’re just someone talking with loose scaffolding. Having designed courses with short videos I can tell you that it forces you, as the designer and teacher, to focus on exactly what you want to say and it really helps in making your points, clearly. Implicit sub-goal labelling, anyone? (I can hear Briana and Mark warming up their keyboards!)
If you want to make your videos 40 minutes long, I can’t stop you. But I can tell you that everything I know tells me that you have set your materials up for another hominid species because you’re not providing something that’s likely to be effective for current humans.
At least they’re being honest
Posted: January 13, 2016 Filed under: Education, Opinion | Tags: advocacy, assessment, authenticity, community, competency-based assessment, education, educational research, ethics, higher education, learning, reflection, student perspective, teaching, teaching approaches, thinking, time banking, time management, tools 2 CommentsI was inspired to write this by a comment about using late penalties but dealing slightly differently with students when they owned up to being late. I have used late penalties extensively (it’s school policy) and so I have a lot of experience with the many ways students try to get around them.

Like everyone, I have had students who have tried to use honesty where every other possible way of getting the assignment in on time (starting early, working on it before the day before, miraculous good luck) has failed. Sometimes students are puzzled that “Oh, I was doing another assignment from another lecturer” isn’t a good enough excuse. (Genuine reasons for interrupted work, medical or compassionate, are different and I’m talking about the ambit extension or ‘dog ate my homework’ level of bargaining.)
My reasoning is simple. In education, owning up to something that you did knowing that it would have punitive consequences of some sort should not immediately cause things to become magically better. Plea bargaining (and this is an interesting article of why that’s not a good idea anywhere) is you agreeing to your guilt in order to reduce your sentence. But this is, once again, horse-trading knowledge on the market. Suddenly, we don’t just have a temporal currency, we have a conformal currency, where getting a better deal involves finding the ‘kindest judge’ among the group who will give you the ‘lightest sentence’. Students optimise their behaviour to what works or, if they’re lucky, they have a behaviour set that’s enough to get them to a degree without changing much. The second group aren’t mostly who we’re talking about and I don’t want to encourage the first group to become bargain-hunting mark-hagglers.
I believe that ‘finding Mr Nice Lecturer’ behaviour is why some students feel free to tell me that they thought someone else’s course was more important than mine, because I’m a pretty nice person and have a good rapport with my students, and many of my colleagues can be seen (fairly or not) as less approachable or less open.
We are not doing ourselves or our students any favours. At the very least, we risk accusations of unfairness if we extend benefits to one group who are bold enough to speak to us (and we know that impostor syndrome and lack of confidence are rife in under-represented groups). At worst, we turn our students into cynical mark shoppers, looking for the easiest touch and planning their work strategy based on what they think they can get away with instead of focusing back on the learning. The message is important and the message must be clearly communicated so that students try to do the work for when it’s required. (And I note that this may or may not coincide with any deadlines.)
We wouldn’t give credit to someone who wrote ‘True’ and then said ‘Oh, but I really meant False’. The work is important or it is not. The deadline is important or it is not. Consequences, in a learning sense, do not have to mean punishments and we do not need to construct a Star Chamber in our offices.
Yes, I do feel strongly about this. I completely understand why people do this and I have also done this before. But after thinking about it at length, I changed my practice so that being honest about something that shouldn’t have happened was appreciated but it didn’t change what occurred unless there was a specific procedural difference in handling. I am not a judge. I am not a jury. I want to change the system so that not only do I not have to be but I’m not tempted to be.
Ugliness 101: late punishments
Posted: January 13, 2016 Filed under: Education, Opinion | Tags: advocacy, aesthetics, authenticity, bandura, beauty, community, design, education, educational problem, educational research, ethics, higher education, in the student's head, instrumentality, lateness, learning, penalties, student perspective, teaching, teaching approaches, thinking, time banking, time management, tools 4 CommentsBefore I lay out the program design I’m thinking of (and, beyond any discussion of competency, as a number of you have suggested, we are heading towards Bloom’s mastery learning as a frame with active learning elements), we need to address one of the most problematic areas of assessment.
Late penalties.
Well, let’s be accurate, penalties are, by definition, punishments imposed for breaking the rules, so these are punishments. This is the stick in the carrot-and-stick reward/punish approach to forcing people to do what you want.
Let’s throw the Greek trinity at this and see how it shapes up. A student produces an otherwise perfect piece of work for an assessment task. It’s her own work. She has spent time developing it. It’s really good. Insightful. Oh, but she handed it up a day late. So we’re now going to say that this knowledge is worth less because it wasn’t delivered on time. She’s working a day job to pay the bills? She should have organised herself better. No Internet at home? Why didn’t she work in the library? I’m sure the campus is totally safe after hours and, well, she should just be careful in getting to and from the library. After all, the most important thing in her life, without knowing anything about her, should be this one hundred line program to reinvent something that has been written over a million times by every other CS student in history.

Have an owl, while you think about that.
That’s not truth. That’s establishing a market value for knowledge with a temporal currency. To me, unless there’s a good reason for doing this, this is as bad as curve grading because it changes what the student has achieved for reasons outside of the assignment activity itself.
“Ah!” you say “Nick, we want to teach people to hand work in on time because that’s how the world works! Time is money, Jones!”
Rubbish. Yes, there are a (small) number of unmovable deadlines in the world. We certainly have some in education because we have to get grades in to achieve graduations and degrees. But most adults function in a world where they choose how to handle all of the commitments in their lives and then they schedule them accordingly. The more you do that, the more practice you get and you can learn how to do it well.
If you have ever given students a week, or even a day’s, extension because of something that has stopped you being able to accept or mark student work, no matter how good the reason, you have accepted that your submission points are arbitrary. (I feel strongly about this and have posted about it before.)
So what would be a good reason for sticking to these arbitrary deadlines? We’d want to see something really positive coming out of the research into this, right? Let’s look at some research on this, starting with Britton and Tesser, “Effects of Time-Management Practices on College Grades”, J Edu Psych, 1991, 83, 3. This reinforces what we already know from Bandura: students who feel in control and have high self-efficacy are going to do well. If a student sits down every day to work out what they’re going to do then they, unsurprisingly, can get things done. But this study doesn’t tell us about long-range time planning – the realm of instrumentality, the capability to link activity today with success in the future. (Here are some of my earlier thoughts on this, with references to Husman.) From Husman, we know that students value tasks in terms of how important they think it is, how motivated they are and how well they can link future success to the current task.
In another J Edu Psych paper (1990,82,4), Macan and Shahani reported that participants who felt that they had control over what they were doing did better but also clearly indicated that ambiguity and stress had an influence on time management in terms of perception and actuality. But the Perceived Control of Time (author’s caps) dominated everything, reducing the impact of ambiguity, reducing the impact of stress, and lead to greater satisfaction.
Students are rarely in control of their submission deadlines. Worse, we often do not take into account everything else in a student’s life (even other University courses) when we set our own deadlines. Our deadlines look arbitrary to students because they are, in the majority of cases. There’s your truth. We choose deadlines that work for our ability to mark and to get grades in or, perhaps, based on whether we are in the country or off presenting research on the best way to get students to hand work in on-time.
(Yes, the owl above is staring at me just as hard as he is staring at anyone else here.)
My own research clearly shows that fixed deadlines do not magically teach students the ability to manage their time and, when you examine it, why should it? (ICER 2012, was part of a larger study that clearly demonstrated students continuing, and even extending, last-minute behaviour all the way to the fourth year of their studies.) Time management is a discipline that involves awareness of the tasks to be performed, a decomposition of those tasks to subtasks that can be performed when the hyperbolic time discounting triggers go off, and a well-developed sense of instrumentality. Telling someone to hand in their work by this date OR ELSE does not increase awareness, train decomposition, or develop any form of planning skills. Well, no wonder it doesn’t work any better than shouting at people teaches them Maxwell’s Equations or caning children suddenly reveals the magic of the pluperfect form in Latin grammar.
So, let’s summarise: students do well when they feel in control and it helps with all of the other factors that could get in the way. So, in order to do almost exactly the opposite of help with this essential support step, we impose frequently arbitrary time deadlines and then act surprised when students fall prey to lack of self-confidence, stress or lose sight of what they’re trying to do. They panic, asking lots of (what appear to be) unnecessary questions because they are desperately trying to reduce confusion and stress. Sound familiar?
I have written about this at length while exploring time banking, giving students agency and the ability to plan their own time, to address all of these points. But the new lens in my educational inspection loupe allows me to be very clear about what is most terribly wrong with late penalties.
They are not just wrong, they satisfy none of anyone’s educational aesthetics. Because we don’t take a student’s real life into account, we are not being fair. Because we are not actually developing the time management abilities but treating them as something that will be auto-didactically generated, we are not being supportive. Because we downgrade work when it is still good, we are being intellectually dishonest. Because we vary deadlines to suit ourselves but may not do so for an individual student, we are being hypocritical. We are degrading the value of knowledge for procedural correctness. This is hideously “unbeautiful”.
That is not education. That’s bureaucracy. Just because most of us live within a bureaucracy doesn’t mean that we have to compromise our pedagogical principles. Even trying to make things fit well, as Rapaport did to try and fit into another scale, we end up warping and twisting our intent, even before we start thinking about lateness and difficult areas such as that. This cannot be good.
There is nothing to stop a teacher setting an exercise that is about time management and is constructed so that all steps will lead someone to develop better time management. Feedback or marks that reflect something being late when that is the only measure of fitness is totally reasonable. But to pretend that you can slap some penalties on to the side of an assessment and it will magically self-scaffold is to deceive yourself, to your students’ detriment. It’s not true.
Do I have thoughts on how to balance marking resources with student feedback requirements, elastic time management, and real assessments while still recognising that there are some fixed deadlines?
Funny you should ask. We’ll come back to this, soon.
No numbers
Posted: January 11, 2016 Filed under: Education, Opinion | Tags: authenticity, beauty, brecht, cui bono, design, education, educational research, ethics, higher education, learning, rapaport, resources, scales, student perspective, teaching, teaching approaches, thinking, tools, triage 5 CommentsWe know that grades are really quite arbitrary and that turning numbers into letters, while something we can do, is actually not that strongly coupled to evaluating learning or demonstrating mastery. Why? Because having the appropriate level of knowledge and being able to demonstrate it are not necessarily the same as being able to pass tests or produce solutions to assignments.
For example, if we look at Rapaport’s triage approach as a way to evaluate student interaction with assignments, we can then design our learning environment to provide multiple opportunities to construct and evaluate knowledge on the understanding that we are seeking clear evidence that a student cannot just perform tasks of this nature but, more important, can do reliably. We can do this even if we use “Good, getting there, wrong and no submission” rather than numbers. The duality of grades (a symbol and its meaning) degenerates to something other than numbers anyway. Students at my University didn’t care about 84 versus 85 until we put a new letter grade in at 85 (High Distinction). But even these distinctions are arbitrary scales when it comes to evaluating actual learning.

A very arbitrary scale.
Why are numbers not important in this? Because they’re rarely important anyway. Have you ever asked your surgeon what her grades were in school? What about your accountant? Perhaps you’ve questioned the percentage that your favourite Master of Wine achieved in the tasting exams? Of course you haven’t. You’ve assumed that a certification (of some sort) indicates sufficient knowledge to practise. And what we have to face is that we are currently falling back onto numbers to give us false confidence that we are measuring learning. They don’t map. They’re not objective. They’re often mathematically nonsensical. No-one cares about them except to provide yet another way of sorting human beings and, goodness knows, we already have enough of those.
Ah, but “but students like to know how they’re going”, right? Yes. Which is where critique and evaluation come in, as well as may other authentic and appropriate ways to recognise progress and encourage curiosity and further development. None of which require numbers.
Let me ask you a question:
Does every student who accumulates enough pass tokens to graduate from your program have a clearly demonstrated ability to perform tasks to the requisite level in all of the knowledge areas of your program?
If the answer is no, then numbers and grades didn’t help, did they? I suspect that, for you as for many others including me, you can probably think of students who managed to struggle through but, in reality, were probably never going to be much good in the field. Perhaps 50% doesn’t magically cover competency? If 50% doesn’t, then raising the bar to 75% won’t solve the problem either. For reasons already mentioned, many of the ways we combine numbers to get grades just don’t make any real sense and they certainly don’t provide much insight into how well the student actually learned what you were trying to teach.
If numbers/grades don’t have much solid foundation, don’t always reflect ability to perform the task, and aren’t actually going to be used in the future? Then they are neither good nor true. And they cannot be beautiful.
Thus, let me strip Rapaport back one notch and provide a three-tier grade-free system, commonly used in many places already, that is closer to what we probably want:
- Nothing submitted,
- Work in progress, resubmit if possible, and
- Work to competent standard.
I know that there are concerns about the word ‘competency’ but I think it’s something we’re going to have think about moving on from. I teach engineers and computer scientists and they have to go out and perform tasks successfully if people are going to employ them or work with them. They have to be competent. Right now, I can tell you which of them have passed but, for a variety of grading reasons, I can’t tell you which one of them, from an academic transcript alone, will be able to sit down and solve your problem. I can see which ones pass exams but I don’t know if this is fixed knowledge or swotting. But what if you made it easy and said “ok, just point to the one who will build me the best bridge”? No. I can’t tell you that. (The most likely worst bridge is easier, as I can identify who does and doesn’t have Civil Engineering qualifications.)
The three-tier scale is simple. The feedback approach that the marker should take is pretty clear in each place and the result is clear to the student. If we build our learning environment correctly, then we can construct a pathway where a student has to achieve tier 3 for all key activities and, at that point, we can actually say “Yes, this student can perform this task or apply this knowledge to the required level”. We do this enough times, we may even start to think that the student could perform this at the level of the profession.
Wait. Have we just re-invented competency-based assessment? There’s an immediate urge to say “but that’s not a University level thing” and I do understand that. CBA has a strong vocational focus but anyone who works in an engineering faculty is already in that boat. We have industry linked accreditation to allow our students to practise as engineers and they have to demonstrate the achievement of a certified program, as well as work experience. That program is taught at University but, given that all you need is to get the degree, you can do it on raw passes and be ‘as accredited’ as the next person.
Now, I’d be the first person to say that not only are many aspects of the University not vocationally focussed but I’d go further and say that they shouldn’t be vocationally focussed. The University is a place that allows for the unfettered exploration of knowledge for knowledge’s sake and I wouldn’t want to change that. (And, yet, so often, we still grade such abstract ideals…) But let’s take competency away from the words job and vocational for a moment. I’m not suggesting we turn Universities into vocational study centres or shut down “non-Industry” programs and schools. (I’d like to see more but that’s another post.) Let’s look at focusing on clarity and simplicity of evaluation.
A student writes an essay on Brecht and submits it for assessment. All of the rich feedback on language use, referencing and analysis still exists without the need to grade it as A, B or C. The question is whether the work should be changed in response to the feedback (if possible) or whether it is, recognisably, an appropriate response to the question ‘write an essay on Brecht’ that will allow the student to develop their knowledge and skills. There is no job focus here but pulling back to separate feedback and identifying whether knowledge has been sufficiently demonstrated is, fundamentally, a competency argument.
The PhD, the pinnacle of the University system, is essentially not graded. You gain vast amounts of feedback over time, you write in response and then you either defend it to your prospective peers or have it blind-assessed by external markers. Yes, there are degrees of acceptance but, ultimately, what you end up with is “Fine as it is”, “Do some more work”, and “Oh, no. Just no.” If we can extend this level of acceptance of competency to our highest valued qualification, what is the consistent and sound reasoning that requires us to look at a student group and say “Hmm, 73. And this one is… yes, 74.”? If I may, cui bono? Who is benefitting here?
But what would such a program look like, you ask? (Hey, and didn’t Nick say he was going to talk about late penalties?) Yes, indeed. Come back tomorrow!
The Illusion of a Number
Posted: January 10, 2016 Filed under: Education, Opinion | Tags: authenticity, beauty, curve grading, design, education, educational problem, educational research, ethics, grading, higher education, in the student's head, learning, rapaport, reflection, resources, teaching, teaching approaches, thinking, tools, wittgenstein 3 Comments
Rabbit? Duck? Paging Wittgenstein!
I hope you’ve had a chance to read William Rapaport’s paper, which I referred to yesterday. He proposed a great, simple alternative to traditional grading that reduces confusion about what is signalled by ‘grade-type’ feedback, as well as making things easier for students and teachers. Being me, after saying how much I liked it, I then finished by saying “… but I think that there are problems.” His approach was that we could break all grading down into: did nothing, wrong answer, some way to go, pretty much there. And that, I think, is much better than a lot of the nonsense that we pretend we hand out as marks. But, yes, I have some problems.
I note that Rapaport’s exceedingly clear and honest account of what he is doing includes this statement. “Still, there are some subjective calls to make, and you might very well disagree with the way that I have made them.” Therefore, I have license to accept the value of the overall scholarship and the frame of the approach, without having to accept all of the implementation details given in the paper. Onwards!
I think my biggest concern with the approach given is not in how it works for individual assessment elements. In that area, I think it shines, as it makes clear what has been achieved. A marker can quickly place the work into one of four boxes if there are clear guidelines as to what has to be achieved, without having to worry about one or two percentage points here or there. Because the grade bands are so distinct, as Rapaport notes, it is very hard for the student to make the ‘I only need one more point argument’ that is so clearly indicative as a focus on the grade rather than the learning. (I note that such emphasis is often what we have trained students for, there is no pejorative intention here.) I agree this is consistent and fair, and time-saving (after Walvoord and Anderson), and it avoids curve grading, which I loathe with a passion.
However, my problems start when we are combining a number of these triaged grades into a cumulative mark for an assignment or for a final letter grade, showing progress in the course. Sections 4.3 and 4.4 of the paper detail the implementation of assignments that have triage graded sub-tasks. Now, instead of receiving a “some way to go” for an assignment, we can start getting different scores for sub-tasks. Let’s look at an example from the paper, note 12, to describe programming projects in CS.
- Problem definition 0,1,2,3
- Top-down design 0,1,2,3
- Documented code
- Code 0,1,2,3
- Documentation 0,1,2,3
- Annotated output
- Output 0,1,2,3
- Annotations 0,1,2,3
Total possible points = 18
Remember my hypothetical situation from yesterday? I provided an example of two students who managed to score enough marks to pass by knowing the complement of each other’s course knowledge. Looking at the above example, it appears (although not easily) to be possible for this situation to occur and both students to receive a 9/18, yet for different aspects. But I have some more pressing questions:
- Should it be possible for a student to receive full marks for output, if there is no definition, design or code presented?
- Can a student receive full marks for everything else if they have no design?
The first question indicates what we already know about task dependencies: if we want to build them into numerical grading, we have to be pedantically specific and provide rules on top of the aggregation mathematics. But, more subtly, by aggregating these measures, we no longer have an ‘accurately triaged’ grade to indicate if the assignment as a whole is acceptable or not. An assignment with no definition, design or code can hardly be considered to be a valid submission, yet good output, documentation and annotation (with no code) will not give us the right result!
The second question is more for those of us who teach programming and it’s a question we all should ask. If a student can get a decent grade for an assignment without submitting a design, then what message are we sending? We are, implicitly, saying that although we talk a lot about design, it’s not something you have to do in order to be successful. Rapaport does go on to talk about weightings and how we can emphasis these issues but we are still faced with an ugly reality that, unless we weight our key aspects to be 50-60% of the final aggregate, students will be able to side-step them and still perform to a passing standard. Every assignment should be doing something useful, modelling the correct approaches, demonstrating correct techniques. How do we capture that?
Now, let me step back and say that I have no problem with identifying the sub-tasks and clearly indicating the level of performance using triage grading, but I disagree with using it for marks. For feedback it is absolutely invaluable: triage grading on sub-tasks will immediately tell you where the majority of students are having trouble, quickly. That then lets you know an area that is more challenging than you thought or one that your students were not prepared for, for some reason. (If every student in the class is struggling with something, the problem is more likely to lie with the teacher.) However, I see three major problems with sub-task aggregation and, thus, with final grade aggregation from assignments.
The first problem is that I think this is the wrong kind of scale to try and aggregate in this way. As Rapaport notes, agreement on clear, linear intervals in grading is never going to be achieved and is, very likely, not even possible. Recall that there are four fundamental types of scale: nominal, ordinal, interval and ratio. The scales in use for triage grading are not interval scales (the intervals aren’t predictable or equidistant) and thus we cannot expect to average them and get sensible results. What we have here are, to my eye, ordinal scales, with no objective distance but a clear ranking of best to worst. The clearest indicator of this is the construction of a B grade for final grading, where no such concept exists in the triage marks for assessing assignment quality. We have created a “some way to go but sometimes nearly perfect” that shouldn’t really exist. Think of it like runners: you win one race and you come third in another. You never actually came second in any race so averaging it makes no sense.
The second problem is that aggregation masks the beauty of triage in terms of identifying if a task has been performed to the pre-determined level. In an ideal world, every area of knowledge that a student is exposed to should be an important contributor to their learning journey. We may have multiple assignments in one area but our assessment mechanism should provide clear opportunities to demonstrate that knowledge. Thus, their achievement of sufficient assignment work to demonstrate their competency in every relevant area of knowledge should be a necessary condition for graduating. When we take triage grading back to an assignment level, we can then look at our assignments grouped by knowledge area and quickly see if a student has some way to go or has achieved the goal. This is not anywhere near as clear when we start aggregating the marks because of the mathematical issues already raised.
Finally, the reduction of triage to mathematical approximation reduces the ability to specify which areas of an assessment are really valuable and, while weighting is a reasonable approximation to this, it is very hard to use a mathematical formula with more and more ‘fudge factors’, a term Rapaport uses, to make up for the fact that this is just a little too fragile.
To summarise, I really like the thrust of this paper. I think what is proposed is far better, even with all of the problems raised above, at giving a reasonable, fair and predictable grade to students. But I think that the clash with existing grading traditions and the implicit requirement to turn everything back into one number is causing problems that have to be addressed. These problems mean that this solution is not, yet, beautiful. But let’s see where we can go.
Tomorrow, I’ll suggest an even more cut-down version of grading and then work on an even trickier problem: late penalties and how they affect grades.
Assessment is (often) neither good nor true.
Posted: January 9, 2016 Filed under: Education, Opinion | Tags: advocacy, aesthetics, beauty, community, design, education, ethics, higher education, in the student's head, kohn, principles of design, rapaport, reflection, student perspective, teaching, teaching approaches, thinking, tools, universal principles of design, work/life balance, workload 3 CommentsIf you’ve been reading my blog over the past years, you’ll know that I have a lot of time for thinking about assessment systems that encourage and develop students, with an emphasis on intrinsic motivation. I’m strongly influenced by the work of Alfie Kohn, unsurprisingly given I’ve already shown my hand on Focault! But there are many other writers who are… reassessing assessment: why we do it, why we think we are doing it, how we do it, what actually happens and what we achieve.

In my framing, I want assessment to be as all other aspects of education: aesthetically satisfying, leading to good outcomes and being clear and what it is and what it is not. Beautiful. Good. True. There are some better and worse assessment approaches out there and there are many papers discussing this. One of these that I have found really useful is Rapaport’s paper on a simplified assessment process for consistent, fair and efficient grading. Although I disagree with some aspects, I consider it to be both good, as it is designed to clearly address a certain problem to achieve good outcomes, and it is true, because it is very honest about providing guidance to the student as to how well they have met the challenge. It is also highly illustrative and honest in representing the struggle of the author in dealing with the collision of novel and traditional assessment systems. However, further discussion of Rapaport is for the near future. Let me start by demonstrating how broken things often are in assessment, by taking you through a hypothetical situation.
Thought Experiment 1
Two students, A and B, are taking the same course. There are a number of assignments in the course and two exams. A and B, by sheer luck, end up doing no overlapping work. They complete different assignments to each other, half each and achieve the same (cumulative bare pass overall) marks. They then manage to score bare pass marks in both exams, but one answers only the even questions and only answers the odd. (And, yes, there are an even number of questions.) Because of the way the assessment was constructed, they have managed to avoid any common answers in the same area of course knowledge. Yet, both end up scoring 50%, a passing grade in the Australian system.
Which of these students has the correct half of the knowledge?
I had planned to build up to Rapaport but, if you’re reading the blog comments, he’s already been mentioned so I’ll summarise his 2011 paper before I get to my main point. In 2011, William J. Rapaport, SUNY Buffalo, published a paper entitled “A Triage Theory of Grading: The Good, The Bad and the Middling.” in Teaching Philosophy. This paper summarised a number of thoughtful and important authors, among them Perry, Wolff, and Kohn. Rapaport starts by asking why we grade, moving through Wolff’s taxonomic classification of assessment into criticism, evaluation, and ranking. Students are trained, by our world and our education systems to treat grades as a measure of progress and, in many ways, a proxy for knowledge. But this brings us into conflict with Perry’s developmental stages, where students start with a deep need for authority and the safety of a single right answer. It is only when students are capable of understanding that there are, in many cases, multiple right answers that we can expect them to understand that grades can have multiple meanings. As Rapaport notes, grades are inherently dual: a representative symbol attached to a quality measure and then, in his words, “ethical and aesthetic values are attached” (emphasis mine.) In other words, a B is a measure of progress (not quite there) that also has a value of being … second-tier if an A is our measure of excellence. A is not A, as it must be contextualised. Sorry, Ayn.
When we start to examine why we are grading, Kohn tells us that the carrot and stick is never as effective as the motivation that someone has intrinsically. So we look to Wolff: are we critiquing for feedback, are we evaluating learning, or are we providing handy value measures for sorting our product for some consumer or market? Returning to my thought experiment above, we cannot provide feedback on assignments that students don’t do, our evaluation of learning says that both students are acceptable for complementary knowledge, and our students cannot be discerned from their graded rank, despite the fact that they have nothing in common!
Yes, it’s an artificial example but, without attention to the design of our courses and in particular the design of our assessment, it is entirely possible to achieve this result to some degree. This is where I wish to refer to Rapaport as an example of thoughtful design, with a clear assessment goal in mind. To step away from measures that provide an (effectively) arbitrary distinction, Rapaport proposes a tiered system for grading that simplifies the overall system with an emphasis on identifying whether a piece of assessment work is demonstrating clear knowledge, a partial solution, an incorrect solution or no work at all.
This, for me, is an example of assessment that is pretty close to true. The difference between a 74 and a 75 is, in most cases, not very defensible (after Haladyna) unless you are applying some kind of ‘quality gate’ that really reduces a percentile scale to, at most, 13 different outcomes. Rapaport’s argument is that we can reduce this further and this will reduce grade clawing, identify clear levels of achieve and reduce marking load on the assessor. That last point is important. A system that buries the marker under load is not sustainable. It cannot be beautiful.
There are issues in taking this approach and turning it back into the grades that our institutions generally require. Rapaport is very open about the difficulties that he has turning his triage system into an acceptable letter grade and it’s worth reading the paper to see that discussion alone, because it quite clearly shows what
Rapaport’s scheme clearly defines which of Wolff’s criteria he wishes his assessment to achieve. The scheme, for individual assessments, is no good for ranking (although we can fashion a ranking from it) but it is good to identify weak areas of knowledge (as transmitted or received) for evaluation of progress and also for providing elementary critique. It says what it is and it pretty much does it. It sets out to achieve a clear goal.
The paper ends with a summary of the key points of Haladyna’s 1999 book “A Complete Guide to Student Grading”, which brings all of this together.
Haladyna says that “Before we assign a grade to any students, we need:
- an idea about what a grade means,
- an understanding of the purposes of grading,
- a set of personal beliefs and proven principles that we will use in teaching
and grading,
- a set of criteria on which the grade is based, and, finally,
- a grading method,which is a set of procedures that we consistently follow
in arriving at each student’s grade. (Haladyna 1999: ix)
There is no doubt that Rapaport’s scheme meets all of these criteria and, yet, for me, we have not yet gone far enough in search of the most beautiful, most good and most true extent that we can take this idea. Is point 3, which could be summarised as aesthetics not enough for me? Apparently not.
Tomorrow I will return to Rapaport to discuss those aspects I disagree with and, later on, discuss both an even more trimmed-down model and some more controversial aspects.
Getting it wrong
Posted: January 7, 2016 Filed under: Education, Opinion | Tags: advocacy, authenticity, design, education, educational problem, educational research, higher education, john c. dewey, learning, pragmatism, principles of design, reflection, teaching, teaching approaches, thinking, tools, william james Leave a commentIt’s fine to write all sorts of wonderful statements about theory and design and we can achieve a lot in thinking about such things. But, let’s be honest, we face massive challenges in the 21st Century and improved thinking and practice in education is one of the most important contributions we can make to future generations. Thus, if we want to change the world based upon our thinking, then all of our discussions have no use if we can’t develop something that’s going to achieve our goals. Dewey’s work provide an experimental, even instrumental, approach to the American philosophical school of pragmatism. To briefly explain this term in the specific meaning, I turn to William James, American psychologist and philosopher.
Pragmatism asks its usual question. “Grant an idea or belief to be true,” it says, “what concrete difference will its being true make in anyone’s actual life? How will the truth be realized? What experiences will be different from those which would obtain if the belief were false? What, in short, is the truth’s cash-value in experiential terms?”
William James, Pragmatism (1907)
(James is far too complex to summarise with one paragraph and I am using only one of his ideas to illustrate my point. Even James’ scholars disagree on how to interpret many of his writings. It’s worth reading him and Hegel at the same time as they square off across the ring quite well.)

Portrait of William James by John La Farge, circa 1859
What will be different? How will we recognise or measure it? What do we gain by knowing if we are right or wrong? This is why all good education researchers depend so heavily on testing their hypotheses in the space where they will make an impact and there is usually an obligation to look at how things are working before and after any intervention. This places further obligation upon us to evaluate what has occurred and then, if our goals haven’t been achieved, change our approach further. It’s a simple breakdown of roles but I often think as educational work in three heavily overlapping areas: practice, scholarship and research. Practice should be applying techniques that achieve our goals, scholarship involves the investigation, dissemination and comparison of these techniques, and research builds on scholarship to evaluate practice in ways that will validate and develop new techniques – or invalidate formerly accepted ones as knowledge improves. This leads me to my point: evaluating your own efforts to work out how to do better next time.
There are designers, architects, makers and engineers who are committed to the practice of impact design, where (and this is one definition):
“Impact design is rooted in the core belief that design can be used to create positive social, environmental and economic change, and focuses on actively measuring impact to inform and direct the design process.” Impact Design Hub, About.
Thus, evaluation of what works is essential for these practitioners. The same website recently shared some designers talking about things that went wrong and what they learned from the process.
If you read that link, you’ll see all sorts of lessons: don’t hand innovative control to someone who’s scared of risk, don’t ignore your community, don’t apply your cultural values to others unless you really know what you’re doing, and don’t forget the importance of communication.
Writing some pretty words every day is not going to achieve my goal and I need to be reminded of the risks that I face in trying to achieve something large – one of which is not actually working towards my own goals in a useful manner! One of the biggest risks is confusing writing a blog with actual work, unless I use this medium to do something. Over the coming weeks, I hope to show you what I am doing as I move towards my very ambitious goal of “beautiful education”. I hope you find the linked article as useful as I did.
Exploring beauty and aesthetics
Posted: January 3, 2016 Filed under: Education, Opinion | Tags: aesthetics, beauty, education, educational problem, educational research, hegel, higher education, Kant, learning, reflection, suits, teaching, teaching approaches, The Grasshopper, thinking, tools, wittgenstein Leave a comment
“Nothing great in the world has ever been accomplished without passion.” Hegel
- the ability to state the goal of any educational activity as separate from the activity,
- the awareness of evidence-based practice and its use in everyday teaching, and
- a willingness to accept that it is correct goal setting and using techniques that work, and can be shown to work, that will lead to better outcomes.
Why writing 50,000 words doesn’t matter as much as writing one.
Posted: October 26, 2015 Filed under: Education, Opinion | Tags: blogging, community, education, educational research, goals, higher education, NaNoWriMo, thinking, tools, workload, writing 3 CommentsHave you heard of NaNoWriMo? National Novel Writing Month has been around since 1999 and is now far more widespread than national boundaries and has become relatively large, with 325,142 participants on six continents in 2014. The idea is simple: over the 30 days of November, you write 50,000 words that are (notionally) all directed towards a fictional novel.
I’ve taken part twice in the past and produced two … rapidly written works of fiction. I have never claimed to be a good writer, I’m certainly not a published writer, but I can put a number of words down on the page in a day. They even make sense, most of the time, and I’ve ended up with stories that real people have actually read and enjoyed!
I like NaNoWriMo. I like it as a concept, because it demystifies the concept of writing that many words by saying “Hey! Don’t get caught up on perfect prose, just start by writing.” I like the community because, despite the large number of people who show up and have no intention of doing it, there are enough like minds to give you support when you need it. I like it personally, as it’s a great way to get down a long draft of a work, even if you don’t do anything else with it. It’s a bit of external structure (and scaffolding) for those of us who aren’t professional authors.
Being me, of course, I’m all about seeing if we can get people doing something that they didn’t think possible, so I look at NaNoWriMo as a success for anyone who writes one more word than they otherwise would have in November. Sure, getting down a whole novel would be awesome but any steps forward are good steps.
We could talk a lot about how this kind of constrained activity can work in a creative setting but, if you know my work, you’ll have a fairly good idea that I think that everyone taking part should have “enjoyment” as their primary goal, with a possible outcome of their first long-form work as a happy side-effect. 50K shouldn’t be a burden but a guide. 1,700 words a day can be more manageable than many people think and, at the end of November, you may have done something you never thought possible.
Whatever happens, you’ll be thinking creatively and that, in my book, is always awesome. “Almighty creativity”, as the late Bob Ross might say.
I’ll be doing NaNo again this year and, if you’re thinking about it, check out the web site or my shortish guide to speed writing (AntifreezePub) that I’ve written based on my own experiences over the years. As always, if you think there should be a better speed writing guide, feel free to write it/find it and link to it in the comments!
I decided to have an outrageous book cover to inspire me and get me into the right mood. (Currently a working copy with placeholder artwork, as I have no idea what will make it into the final draft.)
Almost all of us benefit from writing practice and this is an interesting way to get a lot of practice in a short time. If you do it, have fun, and feel free to buddy up with me under the username jnick.
Musings of an Amateur Mythographer I: Islands of Certainty in a Sea of Confusion
Posted: June 23, 2015 Filed under: Education, Opinion | Tags: advocacy, authenticity, blogging, Claude Lévi-Strauss, design, education, educational problem, educational research, evidence, higher education, Karl Popper, Lévi-Strauss, learning, moocs, myth, mythographer, reflection, resources, scientific thinking, teaching, teaching approaches, thinking, tools Leave a commentI’ve been doing a lot of reading recently on the classification of knowledge, the development of scientific thinking, the ways different cultures approach learning, and the relationship between myths and science. Now, some of you are probably wondering why I can’t watch “Agents of S.H.I.E.L.D.” like a normal person but others of you have already started to shift uneasily because I’ve talked about a relationship between myths and science, as if we do not consider science to be the natural successor to preceding myths. Well, let me go further. I’m about to start drawing on thinking on myths and science and even how the myths that teach us about the importance of evidence, the foundation of science, but for their own purposes.
Why?
Because much of what we face as opposition in educational research are pre-existing stereotypes and misconceptions that people employ, where there’s a lack of (and sometimes in the face of) evidence. Yet this collection of beliefs is powerful because it prevents people from adopting verified and validated approaches to learning and teaching. What can we call these? Are these myths? What do I even mean by that term?
It’s important to realise that the use of the term myth has evolved from earlier, rather condescending, classifications of any culture’s pre-scientific thinking as being dismissively primitive and unworthy of contemporary thought. This is a rich topic by itself but let me refer to Claude Lévi-Strauss and his identification of myth as being a form of thinking and classification, rather than simple story-telling, and thus proto-scientific, rather than anti-scientific. I note that I have done the study of mythology a grave disservice with such an abbreviated telling. Further reading here to understand precisely what Lévi-Strauss was refuting could involve Tylor, Malinowski, and Lévy-Bruhl. This includes rejecting a knee-jerk classification of a less scientifically advanced people as being emotional and practical, rather than (even being capable of) being intellectual. By moving myth forms to an intellectual footing, Lévi-Strauss allows a non-pejorative assessment of the potential value of myth forms.
In many situations, we consider myth and folklore as the same thing, from a Western post-Enlightenment viewpoint, only accepting those elements that we can validate. Thus, we choose not to believe that Olympus holds the Greek Pantheon as we cannot locate the Gods reliably, but the pre-scientific chewing of willow bark to relieve pain was validated once we constructed aspirin (and willow bark tea). It’s worth noting that the early location of willow bark as part of its scientific ‘discovery’ was inspired by an (effectively random) approach called the doctrine of signatures, which assumed that the cause and the cure of diseases would be located near each other. The folkloric doctrine of signatures led the explorers to a plant that tasted like another one but had a different use.
Myth, folklore and science, dancing uneasily together. Does this mean that what we choose to call myth now may or may not be myth in the future? We know that when to use it, to recommend it, in our endorsed and academic context is usually to require it to become science. But what is science?
Karl Popper’s (heavily summarised) view is that we have a set of hypotheses that we test to destruction and this is the foundation of our contemporary view of science. If the evidence we have doesn’t fit the hypothesis then we must reject the hypothesis. When we have enough evidence, and enough hypotheses, we have a supported theory. However, this has a natural knock-on effect in that we cannot actually prove anything, we just have enough evidence to support the hypothesis. Kuhn (again, heavily summarised) has a model of “normal science” where there is a large amount of science as in Popper’s model, incrementing a body of existing work, but there are times when this continuity gives way to a revolutionary change. At these times, we see an accumulation of contradictory evidence that illustrates that it’s time to think very differently about the world. Ultimately, we discover the need for a new coherency, where we need new exemplars to make the world make sense. (And, yes, there’s still a lot of controversy over this.)
Let me attempt to bring this all together, finally. We, as humans, live in a world full of information and some of it, even in our post-scientific world, we incorporate into our lives without evidence and some we need evidence to accept. Do you want some evidence that we live our lives without, or even in spite of, evidence? The median length for a marriage in the United States is 11 years and 40-50% of marriages will end in divorce yet many still swear ‘until death do us part’ or ‘all of my days’. But the myth of ‘marriage forever’ is still powerful. People have children, move, buy houses and totally change their lives based on this myth. The actions that people take here will have a significant impact on the world around them and yet it seems at odd with the evidence. (Such examples are not uncommon and, in a post-scientific revolution world, must force us to consider earlier suggestions that myth-based societies move seamlessly to a science-based intellectual utopia. This is why Lévi-Strauss is interesting to read. Our evidence is that our evidence is not sufficient evidence, so we must seek to better understand ourselves.) Even those components of our shared history and knowledge that are constructed to be based on faith, such as religion, understand how important evidence is to us. Let me give an example.
In the fourth book of the New Testament of the Christian Bible, the Gospel of John, we find the story of the Resurrection of Lazarus. Lazarus is sick and Jesus Christ waits until he dies to go to where he is buried and raise him. Jesus deliberately delays because the glory to the Christian God will be far greater and more will believe, if Lazarus is raised from the dead, rather than just healed from illness. Ultimately, and I do not speak for any religious figure or God here, anyone can get better from an illness but to be raised from the dead (currently) requires a miracle. Evidence, even in a book written for the faithful and to build faith, is important to humans.
We also know that there is a very large amount of knowledge that is accepted as being supported by evidence but the evidence is really anecdotal, based on bias and stereotype, and can even be distorted through repetition. This is the sea of confusion that we all live in. The scientific method (Popper) is one way that we can try to find firm ground to stand on but, if Kuhn is to be believed, there is the risk that one day we stand on the islands and realise that the truth was the sea all along. Even with Popper, we risk standing on solid ground that turns out to be meringue. How many of these changes can one human endure and still be malleable and welcoming in the face of further change?
Our problem with myth is when it forces us to reject something that we can demonstrate to be both valuable and scientifically valid because, right now, the world that we live in is constructed on scientific foundations and coherence is maintained by adding to those foundations. Personally, I don’t believe that myth and science have to be at odds (many disagree with me, including Richard Dawkins of course), and that this is an acceptable view as they are already co-existing in ways that actively shape society, for both good and ill.
Recently I made a comment on MOOCs that contradicted something someone said and I was (quite rightly) asked to provide evidence to support my assertions. That is the post before this one and what you will notice is that I do not have a great deal of what we would usually call evidence: no double-blind tests, no large-n trials with well-formed datasets. I had some early evidence of benefit, mostly qualitative and relatively soft, but, and this is important to me, what I didn’t have was evidence of harm. There are many myths around MOOCs and education in general. Some of them fall into the realm of harmful myths, those that cause people to reject good approaches to adhere to old and destructive practices. Some of them are harmful because they cause us to reject approaches that might work because we cannot find the evidence we need.
I am unsurprised that so many people adhere to folk pedagogy, given the vast amounts of information out there and the natural resistance to rejecting something that you think works, especially when someone sails in and tells you’ve been wrong for years. The fact that we are still discussing the nature of myth and science gives insight into how complicated this issue is.
I think that the path I’m on could most reasonably be called that of the mythographer, but the cataloguing of the edges of myth and the intersections of science is not in order to condemn one or the other but to find out what the truth is to the best of our knowledge. I think that understanding why people believe what they believe allows us to understand what they will need in order to believe something that is actually, well, true. There are many articles written on this, on the difficulty of replacing one piece of learning with another and the dangers of repetition in reinforcing previously-held beliefs, but there is hope in that we can construct new elements to replace old information if we are careful and we understand how people think.
We need to understand the delicate relationships between myth, folklore and science, our history as separate and joined peoples, if only to understand when we have achieved new forms of knowing. But we also need to be more upfront about when we believe we have moved on, including actively identifying areas that we have labelled as “in need of much more evidence” (such as learning styles, for example) to assist people in doing valuable work if they wish to pursue research.
I’ll go further. If we have areas where we cannot easily gain evidence, yet we have competing myths in that space, what should we do? How do we choose the best approach to achieve the most effective educational outcomes? I’ll let everyone argue in the comments for a while and then write that as the next piece.


