Dead man’s curve: Adventures in overdriving.Posted: January 28, 2012 Filed under: Education | Tags: education, higher education, reflection, teaching, teaching approaches 6 Comments
Looking at my previous posts, you’ll know that I have a strong message for my students: do the work, do it yourself, get the knowledge and you’ll most likely (at least) pass. Part of my commitment to my side of the bargain is that a student almost exclusively controls the input that controls his or her own mark. Yes, very occasionally, we will scale a bit if we think that students are being disadvantaged but I have never seen a situation where we have increased the number of students who fail. (To be honest, our scaling is really lightweight.)
In particular, I do not fit grades to a curve. I rarely have so many students that such an approach would be even approaching statistical validity and, more interestingly, I tend to get an approximately normal distribution anyway, without any curve fitting. My position on this is more than mathematical, however, as it doesn’t sit well with me to fail someone because someone else did better.
Because we try very hard to not have to shift marks, we have an implicit obligation to manage the courses so that a pass mark represents a sufficient level of effort in assimilating knowledge, completing assignments, participating in individual and group interactive activities and the exam. Based on that, if everyone has done enough work to pass, then I can pass everyone. The trickier bit is managing the difficulty of the course so that the following conditions hold:
- A pass mark represents a level of effort and demonstrated knowledge that means that a student has achieved the required level of knowledge.
- A fail mark represents a level that, over a number of opportunities for improvement, indicates a combination of insufficient knowledge and/or effort.
- There is scope in every activity for students to demonstrate excellence. These excellence marks go ON TOP of the ‘core’ marks.
- The mark distribution is not bimodal around 0%/100% but has a range of possible values.
To avoid having to manually redistribute the buckets using curve grading, I have to build the course so that the final mark is built from assignments that meet all of those criteria and in a way that the aggregate of these marks will also produce marks that meet the final criteria. This, of course, means that I advertise assignment weightings, combinations and criteria as early as possible to allow students allocate their effort and then I have to incur the marking burden of applying a marking scheme that, once again, gives me this range.
One of the reasons that I believe this is important is because we risk overdriving one of our key student characteristics if we create an artificial curved-based separation. My students have all been through a fairly rigorous selection system by the time they reach me – the numbers dwindle through to final year of high school and the number who go to Uni are less than a quarter of those who start school. The ‘range’ of these students is the ‘not only passed but made it to a Uni course’. This automatically bands them relatively closely. If 100 students sit a course and half get 60 and half get 65 then, assuming I’ve done my job correctly in the design, they all deserve to pass because they are quite close in in-coming ability and they have achieved similar results. More importantly, the half who got 60 don’t deserve to fail because the other half get 65. If you overdrive noise then all you get is loud noise, not some sort of ‘better’ signal.
I’m not opposed to adaptation in teaching – in fact, I’m a huge fan of using challenge and extension questions to allow people other opportunities to excel, to refine their knowledge or to get a chance to be more specific. However, I support it from an additive approach, where marks are added for success, rather than a subtractive approach, where not managing to add more marks is treated as a mark removal exercise if a sufficiently large group of other people manage to add marks. This requires me to design courses carefully, give enough assignment opportunities for people to demonstrate their skills and provide a lot of feedback.
I note that I use almost no standardised testing and, where I do use multiple choice questions, I either require an accompanying explanation or the component is worth a small number of marks. As a result, I have a lot of flexibility in my marking.
I am not saying that we need to dumb down our material – far from it. If we design our courses with ‘acceptance level = pass, extension achievement = distinction’ we can isolate the core material and then put the ‘next stages’ in as well. As I’ve said before, letting a student know that there’s somewhere else to go and something else to do can be a spur to higher achievement.
Coincidentally, I had a meeting today with a colleague who has done some very interesting work on identifying and assessing the amount of ‘core’ material a student gets right from the ‘advanced’ material. From his early figures, there is very little variation in core material achievement level, as you would expect from all of this explanation that I’ve put up, but there was a vast range of achievement in the advanced material. More investigation required!
So, your system is very different from ours. You students appear to get either distinction, pass, or fail as final grades/marks for the course. Our students get A,A-,B+,B,B-,C+,C,C-,D+,D,D-, or F. It used to be just A, B, C, D, or F, but grade inflation has lead to finer-grained systems because the top of the range is carrying most of the burden of distinguishing people. We, in almost all departments but the Law School, do not have mandatory curves, so the professor can do as they like to set final grade distributions (more on that lower down). Professors are required to give the schema for how points/marks/grades are distributed between assignments, projects, and exams on the first day of class so that students are at least aware of how a raw score will be computed to measure their performance.
Now, some professors stick precisely to this schema and assess grades at the end by dividing into 10% bins (90-100% = A, 80-89.9% = B, etc.), though that’s been forced into finer-grained bins with the +/- system. However, that can lead to problems if the material or assessment thereof is too harsh. I once took a class with one A, two Bs, 26 Cs, and 11 Ds and Fs. While this might seem more or less normally distributed, it wasn’t the expectation (see grade inflation). I got one of only two Cs in my entire University career (undergrad and PhD) in that class. It’s just not usually done that way.
Other professors will look at their raw scores and seeing that there are not “enough” students in the high end of the tail, they will move the lines for A and B down until they get something they like. This is entirely ad hoc, and “like” and “enough” are based on prior experience. It’s a relatively unscientific calibration exercise designed to make sure that the professor didn’t screw the class with final grades that were more a reflection of the professor and their delivery than the performance of the students.
Outside of law schools with imposed curves, I don’t think I’ve ever seen a professor shift grades the other way (e.g. when there are “too many” As). Either they stick with the schema in the syllabus given a the beginning of the course, because that’s their policy (sometimes unstated), or because the distribution of grades is close enough to their preference, or they shift grades up to until it matches their preferred distribution.
There’s a reason all this happens. US businesses often rely solely on the Grade Point Average to gate who they interview and hire. GPA under our typical system is computed as a hour-weighted average of grade points with, A=4 points, B=3, C=2, D=1, F=0. A typical course being a 3-hour course, which conveniently meets for 3 hours of lecture a week. Lab courses are typically 1-hour courses that meet for 4 or 5 hours a week. The GPA is so relied upon for jobs and graduate school that it is computed by the university and printed on its official transcripts of student work.
The tendency to shift grades up is driven by professors’ desires not to move their students too far away from the usual distribution set by other professors. People try to calibrate both within and across institutions. People know that a B average at CalTech is probably better than an A at Sam Houston State. But, there’s still a tendency for grades to inflate year after year.
The business hiring and admitting communities demand grades to help them distinguish amongst applicants, so universities oblige them. But, I’d much prefer a system more like yours with less levels and more value in grades for the student (rather than their value to employers).
Sorry, that was long. Hopefully I didn’t over-explain things that you already know.
We actually have more grades than I said. Brevity is the soul of wit but it’s also the ‘whoops, left a bit out’. We have the following grades (with their GPA ‘value’ in parentheses):
Withdraw Fail (0) – Formally leaving the course after the census date is an autofail.
0 Fail Nothing Submitted (FNS) (0)
1-49 Fail (1.5)
45-49 Conceded Pass (3) – DELETED GRADE
50-64 Pass (4)
65-74 Credit (5)
75-84 Distinction (6)
85-100 High Distinction (7)
(Reference is http://www.adelaide.edu.au/enrol/gradepointaverage/)
We publish the grades and, for all grades greater than 49, we publish the percentage score as well. A ‘perfect’ GPA from the University of Adelaide is 7.0. This grading system is referred to as M10 and is the default grading scheme for the University. Other grading schemes exist that provide for non-graded passes, such as GS8. All use the same GPA range. (Reference: http://www.adelaide.edu.au/student/exams/results.html)
Thus, people have no choice about where they put their As, Bs and Cs in our scheme. As a result, any scaling is carried out at the numerical level. Yes, when we think that something has been too harsh or has gone wrong, we will (and do) scale with varying degrees of aggressiveness. However, the final movement is never all that large. (A large degree of scaling would most likely reward the lecturer in question with a ‘please explain’ of intensity proportional to the magnitude of the shift.)
We generally don’t shift down and the reason for that is quite straight forward. It’s easy to impose too great a challenge, to the point that people can’t muster the ability or time required for mastery. Making a course so easy that people are achieving really high marks cannot be easily corrected because you haven’t given the students the space to demonstrate extra knowledge. If an assignment or exam is so hard that even the best and brightest only get, say, 50%, then this is (to my mind) terrible because the less talented may have performed even worse than you would expect in relative terms because of discouragement and disincentive. But scaling up can go some way to fixing this, especially if you start using non-linear scaling (NOT RECOMMENDED. THIS WAY MADNESS LIES!) However, in the other direction, if you set it up so that students can get 100% – how exactly do you justify turning that 100% into 65? You haven’t given them an opportunity to do better and, for me, that means you just have to live with a right heavy distribution.
This is such a complex issue that I’m not making hard and fast rules here. My general guideline is that the course should allow students to demonstrate how much of the knowledge they have gathered in a fair and equitable manner. Transparency is essential.
Aw, hell. Sounds like you have the same sorting issues we do. I knew it was too good to be true. Agreement on all fronts.
Not being in an academic department and offering courses that (embarrassingly, IMNSHO) CS, engineering, science, and mathematics have abdicated responsibility to deliver, we occupy an unusual role. We get very little feedback from peers on our classes. We live in a weird niche.
[…] mentioned earlier that I assemble my final grades out of a number of assignments, all designed to test different […]
[…] I think my biggest concern with the approach given is not in how it works for individual assessment elements. In that area, I think it shines, as it makes clear what has been achieved. A marker can quickly place the work into one of four boxes if there are clear guidelines as to what has to be achieved, without having to worry about one or two percentage points here or there. Because the grade bands are so distinct, as Rapaport notes, it is very hard for the student to make the ‘I only need one more point argument’ that is so clearly indicative as a focus on the grade rather than the learning. (I note that such emphasis is often what we have trained students for, there is no pejorative intention here.) I agree this is consistent and fair, and time-saving (after Walvoord and Anderson), and it avoids curve grading, which I loathe with a passion. […]
[…] have a deep ethical and philosophical objection to curve grading, as you probably know. The reason is simple: the actions of one student should not negatively affect the outcomes of […]