We know that grades are really quite arbitrary and that turning numbers into letters, while something we can do, is actually not that strongly coupled to evaluating learning or demonstrating mastery. Why? Because having the appropriate level of knowledge and being able to demonstrate it are not necessarily the same as being able to pass tests or produce solutions to assignments.
For example, if we look at Rapaport’s triage approach as a way to evaluate student interaction with assignments, we can then design our learning environment to provide multiple opportunities to construct and evaluate knowledge on the understanding that we are seeking clear evidence that a student cannot just perform tasks of this nature but, more important, can do reliably. We can do this even if we use “Good, getting there, wrong and no submission” rather than numbers. The duality of grades (a symbol and its meaning) degenerates to something other than numbers anyway. Students at my University didn’t care about 84 versus 85 until we put a new letter grade in at 85 (High Distinction). But even these distinctions are arbitrary scales when it comes to evaluating actual learning.
Why are numbers not important in this? Because they’re rarely important anyway. Have you ever asked your surgeon what her grades were in school? What about your accountant? Perhaps you’ve questioned the percentage that your favourite Master of Wine achieved in the tasting exams? Of course you haven’t. You’ve assumed that a certification (of some sort) indicates sufficient knowledge to practise. And what we have to face is that we are currently falling back onto numbers to give us false confidence that we are measuring learning. They don’t map. They’re not objective. They’re often mathematically nonsensical. No-one cares about them except to provide yet another way of sorting human beings and, goodness knows, we already have enough of those.
Ah, but “but students like to know how they’re going”, right? Yes. Which is where critique and evaluation come in, as well as may other authentic and appropriate ways to recognise progress and encourage curiosity and further development. None of which require numbers.
Let me ask you a question:
Does every student who accumulates enough pass tokens to graduate from your program have a clearly demonstrated ability to perform tasks to the requisite level in all of the knowledge areas of your program?
If the answer is no, then numbers and grades didn’t help, did they? I suspect that, for you as for many others including me, you can probably think of students who managed to struggle through but, in reality, were probably never going to be much good in the field. Perhaps 50% doesn’t magically cover competency? If 50% doesn’t, then raising the bar to 75% won’t solve the problem either. For reasons already mentioned, many of the ways we combine numbers to get grades just don’t make any real sense and they certainly don’t provide much insight into how well the student actually learned what you were trying to teach.
If numbers/grades don’t have much solid foundation, don’t always reflect ability to perform the task, and aren’t actually going to be used in the future? Then they are neither good nor true. And they cannot be beautiful.
Thus, let me strip Rapaport back one notch and provide a three-tier grade-free system, commonly used in many places already, that is closer to what we probably want:
- Nothing submitted,
- Work in progress, resubmit if possible, and
- Work to competent standard.
I know that there are concerns about the word ‘competency’ but I think it’s something we’re going to have think about moving on from. I teach engineers and computer scientists and they have to go out and perform tasks successfully if people are going to employ them or work with them. They have to be competent. Right now, I can tell you which of them have passed but, for a variety of grading reasons, I can’t tell you which one of them, from an academic transcript alone, will be able to sit down and solve your problem. I can see which ones pass exams but I don’t know if this is fixed knowledge or swotting. But what if you made it easy and said “ok, just point to the one who will build me the best bridge”? No. I can’t tell you that. (The most likely worst bridge is easier, as I can identify who does and doesn’t have Civil Engineering qualifications.)
The three-tier scale is simple. The feedback approach that the marker should take is pretty clear in each place and the result is clear to the student. If we build our learning environment correctly, then we can construct a pathway where a student has to achieve tier 3 for all key activities and, at that point, we can actually say “Yes, this student can perform this task or apply this knowledge to the required level”. We do this enough times, we may even start to think that the student could perform this at the level of the profession.
Wait. Have we just re-invented competency-based assessment? There’s an immediate urge to say “but that’s not a University level thing” and I do understand that. CBA has a strong vocational focus but anyone who works in an engineering faculty is already in that boat. We have industry linked accreditation to allow our students to practise as engineers and they have to demonstrate the achievement of a certified program, as well as work experience. That program is taught at University but, given that all you need is to get the degree, you can do it on raw passes and be ‘as accredited’ as the next person.
Now, I’d be the first person to say that not only are many aspects of the University not vocationally focussed but I’d go further and say that they shouldn’t be vocationally focussed. The University is a place that allows for the unfettered exploration of knowledge for knowledge’s sake and I wouldn’t want to change that. (And, yet, so often, we still grade such abstract ideals…) But let’s take competency away from the words job and vocational for a moment. I’m not suggesting we turn Universities into vocational study centres or shut down “non-Industry” programs and schools. (I’d like to see more but that’s another post.) Let’s look at focusing on clarity and simplicity of evaluation.
A student writes an essay on Brecht and submits it for assessment. All of the rich feedback on language use, referencing and analysis still exists without the need to grade it as A, B or C. The question is whether the work should be changed in response to the feedback (if possible) or whether it is, recognisably, an appropriate response to the question ‘write an essay on Brecht’ that will allow the student to develop their knowledge and skills. There is no job focus here but pulling back to separate feedback and identifying whether knowledge has been sufficiently demonstrated is, fundamentally, a competency argument.
The PhD, the pinnacle of the University system, is essentially not graded. You gain vast amounts of feedback over time, you write in response and then you either defend it to your prospective peers or have it blind-assessed by external markers. Yes, there are degrees of acceptance but, ultimately, what you end up with is “Fine as it is”, “Do some more work”, and “Oh, no. Just no.” If we can extend this level of acceptance of competency to our highest valued qualification, what is the consistent and sound reasoning that requires us to look at a student group and say “Hmm, 73. And this one is… yes, 74.”? If I may, cui bono? Who is benefitting here?
But what would such a program look like, you ask? (Hey, and didn’t Nick say he was going to talk about late penalties?) Yes, indeed. Come back tomorrow!
I hope you’ve had a chance to read William Rapaport’s paper, which I referred to yesterday. He proposed a great, simple alternative to traditional grading that reduces confusion about what is signalled by ‘grade-type’ feedback, as well as making things easier for students and teachers. Being me, after saying how much I liked it, I then finished by saying “… but I think that there are problems.” His approach was that we could break all grading down into: did nothing, wrong answer, some way to go, pretty much there. And that, I think, is much better than a lot of the nonsense that we pretend we hand out as marks. But, yes, I have some problems.
I note that Rapaport’s exceedingly clear and honest account of what he is doing includes this statement. “Still, there are some subjective calls to make, and you might very well disagree with the way that I have made them.” Therefore, I have license to accept the value of the overall scholarship and the frame of the approach, without having to accept all of the implementation details given in the paper. Onwards!
I think my biggest concern with the approach given is not in how it works for individual assessment elements. In that area, I think it shines, as it makes clear what has been achieved. A marker can quickly place the work into one of four boxes if there are clear guidelines as to what has to be achieved, without having to worry about one or two percentage points here or there. Because the grade bands are so distinct, as Rapaport notes, it is very hard for the student to make the ‘I only need one more point argument’ that is so clearly indicative as a focus on the grade rather than the learning. (I note that such emphasis is often what we have trained students for, there is no pejorative intention here.) I agree this is consistent and fair, and time-saving (after Walvoord and Anderson), and it avoids curve grading, which I loathe with a passion.
However, my problems start when we are combining a number of these triaged grades into a cumulative mark for an assignment or for a final letter grade, showing progress in the course. Sections 4.3 and 4.4 of the paper detail the implementation of assignments that have triage graded sub-tasks. Now, instead of receiving a “some way to go” for an assignment, we can start getting different scores for sub-tasks. Let’s look at an example from the paper, note 12, to describe programming projects in CS.
- Problem definition 0,1,2,3
- Top-down design 0,1,2,3
- Documented code
- Code 0,1,2,3
- Documentation 0,1,2,3
- Annotated output
- Output 0,1,2,3
- Annotations 0,1,2,3
Total possible points = 18
Remember my hypothetical situation from yesterday? I provided an example of two students who managed to score enough marks to pass by knowing the complement of each other’s course knowledge. Looking at the above example, it appears (although not easily) to be possible for this situation to occur and both students to receive a 9/18, yet for different aspects. But I have some more pressing questions:
- Should it be possible for a student to receive full marks for output, if there is no definition, design or code presented?
- Can a student receive full marks for everything else if they have no design?
The first question indicates what we already know about task dependencies: if we want to build them into numerical grading, we have to be pedantically specific and provide rules on top of the aggregation mathematics. But, more subtly, by aggregating these measures, we no longer have an ‘accurately triaged’ grade to indicate if the assignment as a whole is acceptable or not. An assignment with no definition, design or code can hardly be considered to be a valid submission, yet good output, documentation and annotation (with no code) will not give us the right result!
The second question is more for those of us who teach programming and it’s a question we all should ask. If a student can get a decent grade for an assignment without submitting a design, then what message are we sending? We are, implicitly, saying that although we talk a lot about design, it’s not something you have to do in order to be successful. Rapaport does go on to talk about weightings and how we can emphasis these issues but we are still faced with an ugly reality that, unless we weight our key aspects to be 50-60% of the final aggregate, students will be able to side-step them and still perform to a passing standard. Every assignment should be doing something useful, modelling the correct approaches, demonstrating correct techniques. How do we capture that?
Now, let me step back and say that I have no problem with identifying the sub-tasks and clearly indicating the level of performance using triage grading, but I disagree with using it for marks. For feedback it is absolutely invaluable: triage grading on sub-tasks will immediately tell you where the majority of students are having trouble, quickly. That then lets you know an area that is more challenging than you thought or one that your students were not prepared for, for some reason. (If every student in the class is struggling with something, the problem is more likely to lie with the teacher.) However, I see three major problems with sub-task aggregation and, thus, with final grade aggregation from assignments.
The first problem is that I think this is the wrong kind of scale to try and aggregate in this way. As Rapaport notes, agreement on clear, linear intervals in grading is never going to be achieved and is, very likely, not even possible. Recall that there are four fundamental types of scale: nominal, ordinal, interval and ratio. The scales in use for triage grading are not interval scales (the intervals aren’t predictable or equidistant) and thus we cannot expect to average them and get sensible results. What we have here are, to my eye, ordinal scales, with no objective distance but a clear ranking of best to worst. The clearest indicator of this is the construction of a B grade for final grading, where no such concept exists in the triage marks for assessing assignment quality. We have created a “some way to go but sometimes nearly perfect” that shouldn’t really exist. Think of it like runners: you win one race and you come third in another. You never actually came second in any race so averaging it makes no sense.
The second problem is that aggregation masks the beauty of triage in terms of identifying if a task has been performed to the pre-determined level. In an ideal world, every area of knowledge that a student is exposed to should be an important contributor to their learning journey. We may have multiple assignments in one area but our assessment mechanism should provide clear opportunities to demonstrate that knowledge. Thus, their achievement of sufficient assignment work to demonstrate their competency in every relevant area of knowledge should be a necessary condition for graduating. When we take triage grading back to an assignment level, we can then look at our assignments grouped by knowledge area and quickly see if a student has some way to go or has achieved the goal. This is not anywhere near as clear when we start aggregating the marks because of the mathematical issues already raised.
Finally, the reduction of triage to mathematical approximation reduces the ability to specify which areas of an assessment are really valuable and, while weighting is a reasonable approximation to this, it is very hard to use a mathematical formula with more and more ‘fudge factors’, a term Rapaport uses, to make up for the fact that this is just a little too fragile.
To summarise, I really like the thrust of this paper. I think what is proposed is far better, even with all of the problems raised above, at giving a reasonable, fair and predictable grade to students. But I think that the clash with existing grading traditions and the implicit requirement to turn everything back into one number is causing problems that have to be addressed. These problems mean that this solution is not, yet, beautiful. But let’s see where we can go.
Tomorrow, I’ll suggest an even more cut-down version of grading and then work on an even trickier problem: late penalties and how they affect grades.
If you’ve been reading my blog over the past years, you’ll know that I have a lot of time for thinking about assessment systems that encourage and develop students, with an emphasis on intrinsic motivation. I’m strongly influenced by the work of Alfie Kohn, unsurprisingly given I’ve already shown my hand on Focault! But there are many other writers who are… reassessing assessment: why we do it, why we think we are doing it, how we do it, what actually happens and what we achieve.
In my framing, I want assessment to be as all other aspects of education: aesthetically satisfying, leading to good outcomes and being clear and what it is and what it is not. Beautiful. Good. True. There are some better and worse assessment approaches out there and there are many papers discussing this. One of these that I have found really useful is Rapaport’s paper on a simplified assessment process for consistent, fair and efficient grading. Although I disagree with some aspects, I consider it to be both good, as it is designed to clearly address a certain problem to achieve good outcomes, and it is true, because it is very honest about providing guidance to the student as to how well they have met the challenge. It is also highly illustrative and honest in representing the struggle of the author in dealing with the collision of novel and traditional assessment systems. However, further discussion of Rapaport is for the near future. Let me start by demonstrating how broken things often are in assessment, by taking you through a hypothetical situation.
Thought Experiment 1
Two students, A and B, are taking the same course. There are a number of assignments in the course and two exams. A and B, by sheer luck, end up doing no overlapping work. They complete different assignments to each other, half each and achieve the same (cumulative bare pass overall) marks. They then manage to score bare pass marks in both exams, but one answers only the even questions and only answers the odd. (And, yes, there are an even number of questions.) Because of the way the assessment was constructed, they have managed to avoid any common answers in the same area of course knowledge. Yet, both end up scoring 50%, a passing grade in the Australian system.
Which of these students has the correct half of the knowledge?
I had planned to build up to Rapaport but, if you’re reading the blog comments, he’s already been mentioned so I’ll summarise his 2011 paper before I get to my main point. In 2011, William J. Rapaport, SUNY Buffalo, published a paper entitled “A Triage Theory of Grading: The Good, The Bad and the Middling.” in Teaching Philosophy. This paper summarised a number of thoughtful and important authors, among them Perry, Wolff, and Kohn. Rapaport starts by asking why we grade, moving through Wolff’s taxonomic classification of assessment into criticism, evaluation, and ranking. Students are trained, by our world and our education systems to treat grades as a measure of progress and, in many ways, a proxy for knowledge. But this brings us into conflict with Perry’s developmental stages, where students start with a deep need for authority and the safety of a single right answer. It is only when students are capable of understanding that there are, in many cases, multiple right answers that we can expect them to understand that grades can have multiple meanings. As Rapaport notes, grades are inherently dual: a representative symbol attached to a quality measure and then, in his words, “ethical and aesthetic values are attached” (emphasis mine.) In other words, a B is a measure of progress (not quite there) that also has a value of being … second-tier if an A is our measure of excellence. A is not A, as it must be contextualised. Sorry, Ayn.
When we start to examine why we are grading, Kohn tells us that the carrot and stick is never as effective as the motivation that someone has intrinsically. So we look to Wolff: are we critiquing for feedback, are we evaluating learning, or are we providing handy value measures for sorting our product for some consumer or market? Returning to my thought experiment above, we cannot provide feedback on assignments that students don’t do, our evaluation of learning says that both students are acceptable for complementary knowledge, and our students cannot be discerned from their graded rank, despite the fact that they have nothing in common!
Yes, it’s an artificial example but, without attention to the design of our courses and in particular the design of our assessment, it is entirely possible to achieve this result to some degree. This is where I wish to refer to Rapaport as an example of thoughtful design, with a clear assessment goal in mind. To step away from measures that provide an (effectively) arbitrary distinction, Rapaport proposes a tiered system for grading that simplifies the overall system with an emphasis on identifying whether a piece of assessment work is demonstrating clear knowledge, a partial solution, an incorrect solution or no work at all.
This, for me, is an example of assessment that is pretty close to true. The difference between a 74 and a 75 is, in most cases, not very defensible (after Haladyna) unless you are applying some kind of ‘quality gate’ that really reduces a percentile scale to, at most, 13 different outcomes. Rapaport’s argument is that we can reduce this further and this will reduce grade clawing, identify clear levels of achieve and reduce marking load on the assessor. That last point is important. A system that buries the marker under load is not sustainable. It cannot be beautiful.
There are issues in taking this approach and turning it back into the grades that our institutions generally require. Rapaport is very open about the difficulties that he has turning his triage system into an acceptable letter grade and it’s worth reading the paper to see that discussion alone, because it quite clearly shows what
Rapaport’s scheme clearly defines which of Wolff’s criteria he wishes his assessment to achieve. The scheme, for individual assessments, is no good for ranking (although we can fashion a ranking from it) but it is good to identify weak areas of knowledge (as transmitted or received) for evaluation of progress and also for providing elementary critique. It says what it is and it pretty much does it. It sets out to achieve a clear goal.
The paper ends with a summary of the key points of Haladyna’s 1999 book “A Complete Guide to Student Grading”, which brings all of this together.
Haladyna says that “Before we assign a grade to any students, we need:
- an idea about what a grade means,
- an understanding of the purposes of grading,
- a set of personal beliefs and proven principles that we will use in teaching
- a set of criteria on which the grade is based, and, finally,
- a grading method,which is a set of procedures that we consistently follow
in arriving at each student’s grade. (Haladyna 1999: ix)
There is no doubt that Rapaport’s scheme meets all of these criteria and, yet, for me, we have not yet gone far enough in search of the most beautiful, most good and most true extent that we can take this idea. Is point 3, which could be summarised as aesthetics not enough for me? Apparently not.
Tomorrow I will return to Rapaport to discuss those aspects I disagree with and, later on, discuss both an even more trimmed-down model and some more controversial aspects.
It’s fine to write all sorts of wonderful statements about theory and design and we can achieve a lot in thinking about such things. But, let’s be honest, we face massive challenges in the 21st Century and improved thinking and practice in education is one of the most important contributions we can make to future generations. Thus, if we want to change the world based upon our thinking, then all of our discussions have no use if we can’t develop something that’s going to achieve our goals. Dewey’s work provide an experimental, even instrumental, approach to the American philosophical school of pragmatism. To briefly explain this term in the specific meaning, I turn to William James, American psychologist and philosopher.
Pragmatism asks its usual question. “Grant an idea or belief to be true,” it says, “what concrete difference will its being true make in anyone’s actual life? How will the truth be realized? What experiences will be different from those which would obtain if the belief were false? What, in short, is the truth’s cash-value in experiential terms?”
William James, Pragmatism (1907)
(James is far too complex to summarise with one paragraph and I am using only one of his ideas to illustrate my point. Even James’ scholars disagree on how to interpret many of his writings. It’s worth reading him and Hegel at the same time as they square off across the ring quite well.)
What will be different? How will we recognise or measure it? What do we gain by knowing if we are right or wrong? This is why all good education researchers depend so heavily on testing their hypotheses in the space where they will make an impact and there is usually an obligation to look at how things are working before and after any intervention. This places further obligation upon us to evaluate what has occurred and then, if our goals haven’t been achieved, change our approach further. It’s a simple breakdown of roles but I often think as educational work in three heavily overlapping areas: practice, scholarship and research. Practice should be applying techniques that achieve our goals, scholarship involves the investigation, dissemination and comparison of these techniques, and research builds on scholarship to evaluate practice in ways that will validate and develop new techniques – or invalidate formerly accepted ones as knowledge improves. This leads me to my point: evaluating your own efforts to work out how to do better next time.
There are designers, architects, makers and engineers who are committed to the practice of impact design, where (and this is one definition):
“Impact design is rooted in the core belief that design can be used to create positive social, environmental and economic change, and focuses on actively measuring impact to inform and direct the design process.” Impact Design Hub, About.
Thus, evaluation of what works is essential for these practitioners. The same website recently shared some designers talking about things that went wrong and what they learned from the process.
If you read that link, you’ll see all sorts of lessons: don’t hand innovative control to someone who’s scared of risk, don’t ignore your community, don’t apply your cultural values to others unless you really know what you’re doing, and don’t forget the importance of communication.
Writing some pretty words every day is not going to achieve my goal and I need to be reminded of the risks that I face in trying to achieve something large – one of which is not actually working towards my own goals in a useful manner! One of the biggest risks is confusing writing a blog with actual work, unless I use this medium to do something. Over the coming weeks, I hope to show you what I am doing as I move towards my very ambitious goal of “beautiful education”. I hope you find the linked article as useful as I did.
As I’ve noted, the space I’m in is not new, although some of the places I hope to go with it are, and we have records of approaches to education that I think fit well into an aesthetic framing.
As a reminder, I’m moving beyond ‘sensually pleasing’ in the usual sense and extending this to the wider definition of aesthetics: characteristics that define an approach or movement. However, we can still see a Cubist working as both traditionally aesthetically pleasing and also beautiful because of its adherence to the Cubist aesthetic. To draw on this, where many art viewers find a large distance between them and an art work, it is often attributable to a conflict over how beauty is defined in this context. As Hegel noted, beauty is not objective, it is our perspective and our understanding of its effect upon us (after Kant) that contributes greatly to the experience.
Dewey’s Pedagogic Creed was published in 1897 and he sought to share his beliefs on what education was, what schools were, what he considered the essential subject-matter of education, the methods employed, and the essential role of the school in social progress. I use the word ‘beliefs’ deliberately as this is what Dewey published: line after line of “I believe…” (As a note, this is what a creed is, or should be, as a set of beliefs or aims to guide action. The word ‘creed’ comes to us from the Latin credo, which means “I believe”.) Dewey is not, for the most part, making a religious statement in his Creed although his personal faith is expressed in a single line at the end.
To my reading, and you know that I seek characteristics that I can use to form some sort of object to guide me in defining beautiful education, many of Dewey’s points easily transfer to characteristics of beauty. For example, here are three lines from the work:
- “I believe that education thus conceived marks the most perfect and intimate union of science and art conceivable in human experience.“
- “I believe that with the growth of psychological science, giving added insight into individual structure and laws of growth; and with growth of social science, adding to our knowledge of the right organization of individuals, all scientific resources can be utilized for the purposes of education.“
- “ I believe that under existing conditions far too much of the stimulus and control proceeds from the teacher, because of neglect of the idea of the school as a form of social life.“
Dewey was very open about what he thought the role of school was, he saw it as the “fundamental method of social progress and reform“. I believe that he saw education, when carried out correctly, as being a thing that was beautiful, good and true and his displeasure with what he encountered in the schools and colleges of the late 19th/early 20th Century is manifest in his writings. He writes in reaction to an ugly, unfair, industrialised and mechanistic system and he wants something that conforms to his aesthetics. From the three lines above, he seeks education that is grounded in the arts and science, he wants to use technology in a positive way and he wants schools to be a vibrant and social community.
And this is exactly what the evidence tells us works. The fact that Dewey arrived at this through a focus on equity, opportunity, his work in psychology and his own observations is a testament to his vision. Dewey was rebelling against the things he could see were making children hate education.
I believe that next to deadness and dullness, formalism and routine, our education is threatened with no greater evil than sentimentalism.
John Dewey, School Journal vol. 54 (January 1897), pp. 77-80
Here, sentimentalism is where we try to evoke emotions without associating them with an appropriate action: Dewey seeks authenticity and a genuine expression. But look at the rest of that list: dead, dull, formal and routine. Dewey would go on to talk about schools as if they were prisons and over a hundred years later, we continue to line students up into ranks and bore them.
I have a lot of work to do as I study Dewey and his writings again with my aesthetic lens in place but, while I do so, it might be worth reading the creed. Some things are dated. Some ideas have been improved upon with more research, including his own and we will return to these issues. But I find it hard to argue with this:
I believe that the community’s duty to education is, therefore, its paramount moral duty. By law and punishment, by social agitation and discussion, society can regulate and form itself in a more or less haphazard and chance way. But through education society can formulate its own purposes, can organize its own means and resources, and thus shape itself with definiteness and economy in the direction in which it wishes to move.
A comment on yesterday’s post noted that minimising ugliness is a highly desirable approach to take for many students, given how ugly their worlds are with poverty, violence, bullying. I completely agree that these things should be minimised but this is a commitment that we should be making as a society, not leaving to education. Yes, education is the best way to reduce these problems but that requires effective education and, for that, I return to my point that a standard of acceptable plainness is just not enough when we plan and design education. It’s not enough that our teaching be tolerable, it should be amazing, precisely because of the potential benefits to our society.
If, in education, we only seek a minimum bar then the chances of us achieving more than that are reduced and we probably won’t have a good measure of “better” should it occur. We can’t take intentional actions to change something that we’re not measuring.
Many of the ugliest problems in society have arisen from short-sighted thinking, fixes that are the definition of plain instead of beautiful or inspiring, and from not having a committed vision to aim for better. That’s why I’m so heavily focused on beauty and aesthetics in education, to provide a basis for vision that is manageable sized yet sufficiently powerful.
I won’t (I can’t) address every equity issue, every unfair thing, or every terrible aspect of modern educational practice in these pieces. But I hope to motivate, over time, why this rather philosophical approach is a good basis for visionary improvements to education.
- the ability to state the goal of any educational activity as separate from the activity,
- the awareness of evidence-based practice and its use in everyday teaching, and
- a willingness to accept that it is correct goal setting and using techniques that work, and can be shown to work, that will lead to better outcomes.
Ever since education became something we discussed, teachers and learners alike have had strong opinions regarding the quality of education and how it can be improved. What is surprising, as you look at these discussions over time, is how often we seem to come back to the same ideas. We read Dewey and we hear echoes of Rousseau. So many echoes and so much careful thought, found as we built new modern frames with Vygotsky, Piaget, Montessori, Papert and so many more. But little of this should really be a surprise because we can go back to the writings of Marcus Fabius Quintilianus (Quinitilian) and his twelve books of The Orator’s Education and we find discussion of small class sizes, constructive student-focused discussions, and that more people were capable of thought and far-reaching intellectual pursuits than was popularly believed.
“… as birds are born for flying, horses for speed, beasts of prey for ferocity, so are [humans] for mental activity and resourcefulness.” Quintilian, Book I, page 65.
I used to say that it was stunning how contemporary education seems to be slow in moving in directions first suggested by Dewey a hundred years ago, then I discovered that Rousseau had said it 150 years before that. Now I find that Quntilian wrote things such as this nearly 2,000 years ago. And Marcus Aurelius, among other stoics, made much of approaches to thinking that, somehow, were put to one side as we industrialised education much as we had industrialised everything else.
This year I have accepted that we have had 2,000 years of thinking (and as much evidence when we are bold enough to experiment) and yet we just have not seen enough change. Dewey’s critique of the University is still valid. Rousseau’s lament on attaining true mastery of knowledge stands. Quintilian’s distrust of mere imitation would not be quieted when looking at much of repetitive modern examination practice.
What stops us from changing? We have more than enough evidence of discussion and thought, from some of the greatest philosophers we have seen. When we start looking at education, in varying forms, we wander across Plato, Hypatia, Hegel, Kant, Nietzsche, in addition to all of those I have already mentioned. But evidence, as it stands, does not appear to be enough, especially in the face of personal perception of achievement, contribution and outcomes, whether supported by facts or not.
Evidence of uncertainty is not enough. Evidence of the lack of efficacy of techniques, now that we can and do measure them, is not enough. Evidence that students fail who then, under other tutors or approaches, mysteriously flourish elsewhere, is not enough.
Authority, by itself, is not enough. We can be told to do more or to do things differently but the research we have suggests that an externally applied control mechanism just doesn’t work very well for areas where thinking is required. And thinking is, most definitely, required for education.
I have already commented elsewhere on Mark Guzdial’s post that attracted so much attention and, yet, all he was saying was what we have seen repeated throughout history and is now supported in this ‘gilt age’ of measurement of efficacy. It still took local authority to stop people piling onto him (even under the rather shabby cloak of ‘scientific enquiry’ that masks so much negative activity). Mark is repeating the words of educators throughout the ages who have stepped back and asked “Is what we are doing the best thing we could be doing?” It is human to say “But, if I know that this is the evidence, why am I acting as if it were not true?” But it is quite clear that this is still challenging and, amazingly, heretical to an extent, despite these (apparently controversial) ideas pre-dating most of what we know as the trappings and establishments of education. Here is our evidence that evidence is not enough. This experience is the authority that, while authority can halt a debate, authority cannot force people to alter such a deeply complex and cognitive practice in a useful manner. Nobody is necessarily agreeing with Mark, they’re just no longer arguing. That’s not helpful.
So, where to from here?
We should not throw out everything old simply because it is old, as that is meaningless without evidence to do so and it is wrong as autocratically rejecting everything new because it is new.
The challenge is to find a way of explaining how things could change without forcing conflict between evidence and personal experience and without having to resort to an argument by authority, whether moral or experiential. And this is a massive challenge.
This year, I looked back to find other ways forward. I looked back to the three values of Ancient Greece, brought together as a trinity through Socrates and Plato.
These three values are: beauty, goodness and truth. Here, truth means seeing things as they are (non-concealment). Goodness denotes the excellence of something and often refers to a purpose of meaning for existence, in the sense of a good life. Beauty? Beauty is an aesthetic delight; pleasing to those senses that value certain criteria. It does not merely mean pretty, as we can have many ways that something is aesthetically pleasing. For Dewey, equality of access was an essential criterion of education; education could only be beautiful to Dewey if it was free and easily available. For Plato, the revelation of knowledge was good and beauty could arose a love for this knowledge that would lead to such a good. By revealing good, reality, to our selves and our world, we are ultimately seeking truth: seeing the world as it really is.
In the Platonic ideal, a beautiful education leads us to fall in love with learning and gives us momentum to strive for good, which will lead us to truth. Is there any better expression of what we all would really want to see in our classrooms?
I can speak of efficiencies of education, of retention rates and average grades. Or I can ask you if something is beautiful. We may not all agree on details of constructivist theory but if we can discuss those characteristics that we can maximise to lead towards a beautiful outcome, aesthetics, perhaps we can understand where we differ and, even more optimistically, move towards agreement. Towards beautiful educational practice. Towards a system and methodology that makes our students as excited about learning as we are about teaching. Let me illustrate.
A teacher stands in front of a class, delivering the same lecture that has been delivered for the last ten years. From the same book. The classroom is half-empty. There’s an assignment due tomorrow morning. Same assignment as the last three years. The teacher knows roughly how many people will ask for an extension an hour beforehand, how many will hand up and how many will cheat.
I can talk about evidence, about pedagogy, about political and class theory, about all forms of authority, or I can ask you, in the privacy of your head, to think about these questions.
- Is this beautiful? Which of the aesthetics of education are really being satisfied here?
- Is it good? Is this going to lead to the outcomes that you want for all of the students in the class?
- Is it true? Is this really the way that your students will be applying this knowledge, developing it, exploring it and taking it further, to hand on to other people?
- And now, having thought about yourself, what do you think your students would say? Would they think this was beautiful, once you explained what you meant?
Over the coming year, I will be writing a lot more on this. I know that this idea is not unique (Dewey wrote on this, to an extent, and, more recently, several books in the dramatic arts have taken up the case of beauty and education) but it is one that we do not often address in science and engineering.
My challenge, for 2016, is to try to provide a year of beautiful education. Succeed or fail, I will document it here.
I’ve been thinking about learning analytics and, while some Unis have managed to solve parts of the problem, I think that we need to confront the complexity of the problem, to explain why it’s so challenging. I break it into five key problems.
- Data. We don’t currently collect enough of it to analyse, what we do collect is of questionable value and isn’t clearly tied to mechanisms, and we have not confronted the spectre of what we do with this data when we get it.
- Mechanisms linking learning and what is produced. The mechanisms are complex. Students could be failing for any number of reasons, not the least of which is crap staff. Trying to work out what has happened by looking at outputs is unlikely to help.
- Focus. Generally, we measure things to evaluate people. This means that students do tests to get marked and, even where we mix this up with formative work, they tend to focus on the things that get them marks. That’s because it’s how we’ve trained them. This focus warps measurement into an enforcement and judgment mechanism, rather than a supportive and constructive mechanism.
- Community. We often mandate or apply analytics as an extension of the evaluation focus above. This means that we don’t have a community who are supported by analytics, we have a community of evaluators and the evaluated. This is what we would usually label as a Panopticon, because of the asymmetrical application of this kind of visibility. And it’s not a great environment for education. Without a strong community, why should staff go to the extra effort to produce the things required to generate more data if they can’t see a need for it? This is a terribly destructive loop as it requires learning analytics to work and be seen as effective before you have the data to make learning analytics work!
- Support. When we actually have the data, understand the mechanism, have the right focus and are linked in to the community, we still need the money, time and other resources to provide remediation, to encourage development, to pay for the technology, to send people to places where they can learn. For students and staff. We just don’t have that.
I think almost all Unis are suffering from the same problems. This is a terribly complex problem and it cannot be solved by technology alone.
It’s certainly not as easy as driving car. You know that you make the car go faster by pushing on one pedal and you make it go slower by pushing on another. You look at your speedometer. This measures how often your wheels are rotating and, by simple arithmetic, gives you your speed across the road. Now you can work out the speed you want to travel at, taking into account signs, conditions and things like that. Simple. But this simple, everyday, action and its outcomes are the result of many, many technological, social and personal systems interacting.
The speedometer in the car is giving you continuously available, and reasonably reliable, data on your performance. You know how to influence that performance through the use of simple and direct controls (mechanism). There exists a culture of driver training, road signage and engineering, and car design that provides you with information that ties your personal performance to external achievement (These are all part of support, focus and community). Finally, there are extrinsic mechanisms that function as checks and balances but, importantly, they are not directly tied to what you are doing in the car, although there are strong causative connections to certain outcomes (And we can see elements of support and community in this as we all want to drive on safe roads, hence state support for this is essential).
We are nowhere near the car scenario with learning analytics right now. We have some measurements of learning in the classroom because we grade assignments and mark exams. But these are not continuous feedback, to be consulted wherever possible, and the mechanisms to cause positive change in these are not necessarily clear and direct. I would argue that most of what we currently do is much closer to police enforcement of speed. We ask students to drive a track and, periodically, we check to see if they’re doing the correct speed. We then, often irrevocably from a grading sense, assign a mark to how well they are driving the track and settle back to measure them again later.
Learning analytics faces huge problems before it reaches this stage. We need vast quantities of data that we are not currently generating. Many University courses lack opportunities to demonstrate prowess early on. Many courses offer only two or three measurements of performance to determine the final grade. This trying to guess our speed when the speedo only lights up every three to four weeks after we have pressed a combination of pedals.
The mechanisms for improvement and performance control in University education are not just murky, they’re opaque. If we identify a problem, what happens? In the case of detecting that we are speeding, most of us will slow down. If the police detect you are speeding, they may stop you or (more likely) issue you a fine and eventually you’ll use up your licence and have to stop driving. We just give people low marks or fail them. But, combine this with mechanism issues, and suddenly we need to ask if we’re even ready to try to take action if we had the analytics.
Let’s say we get all the data and it’s reliable and pedagogically sensible. We work out how to link things together. We build community support and we focus it correctly. You run analytics over your data. After some digging, you discover that 70% of your teaching staff simply don’t know how to do their jobs. And, as far as you can see, have been performing at this standard for 20 years.
What do you do?
Until we are ready to listen to what analytics tell us, until we have had the discussion of how we deal with students (and staff) who may wish to opt out, and until we have looked at this as the monstrous, resource-hungry, incredibly complex problem that it is, we really have to ask if we’re ready to take learning analytics seriously. And, given how much money can be spent on this, it’s probably better to work out if we’re going to listen before we invest money into a solution that won’t work because it cannot work.