# If We’re Going To Measure, Let’s Measure Properly: Teaching Isn’t a NASCAR Race

**Posted:**May 22, 2012

**Filed under:**Education |

**Tags:**advocacy, education, educational problem, higher education, learning, measurement, measurement fallacy, MIKE, principles of design, teaching, teaching approaches 1 Comment

I’ve been reading a Huff Post piece on teacher assessment, entitled “Carolyn Abbott, The Worst 8th Grade Math Teacher In New York City, Victim Of Her Own Success”, where a teacher, Carolyn Abbot, at a gifted and talented school in Manhattan was rated being the worst teacher in 8th grade.

The problem, it appears, is the measurement used where your contribution is based upon whether your students have performed better or worse than last year on the Teacher Data Report, a measure used to assess contribution to English and Math. So here’s the problem. The teacher taught maths to grades 7 and 8 and her Grade 7 students achieved at the 98th percentile for their test in 2009. Therefore, according to the Teacher Data Reports modelling process, the same students should have achieved 97th percentile in their Grade 8 tests the following year. They *only* managed 89th percentile. Abbot had made a significant negative contribution to her students, by this logic, and her ranking was the lowest in NYC 8th grade mathematics teachers.

Yes, you read that right. She’s kept the students in the 1.5-2 standard deviations above the norm category. The students have moved up a year and are now starting to run into the puberty zone, always fun, they’re still scoring in the 89th percentile – and she’s the worst teacher in NYC. Her students struggle with the standardised testing itself: the non-mathematical nature of the tests, the requirement to put in a single answer when the real answer is potentially more complex, the fact that multiple choice can be trained for (rather than test anything) – and they’re still kicking out at the +85 level. Yet, she’s the worst 8th grade math teacher in NYC.

This also goes against one of my general principles of assessment, in that the performance of someone else affects the assessment of your performance. (Yes, that leaves me at odds with national testing schemes, because I don’t see a way that they can be meaningfully calibrated across many different teaching systems and economic influences. It’s obvious that New York haven’t worked it out properly for one system and one economy!) Having a notion of acceptable and unacceptable is useful here. Having a notion of exemplary, acceptable and unacceptable is useful here. Having a notion of best and worst is meaningless, because all these teachers could score 100/100 and one scores 100/99 and they’re the worst. Ranking must be combined with standards of acceptability where professional practice is required. This isn’t a NASCAR race: in teaching, everyone can cross the line in a way that they win.

I am a big fan ofÂ **useful, carefully constructed and correctly used measurement** but this story is an example of what happens if you come up with a simple measure that gives you a single number that isn’t much use but is used as if it means something. Now, if over time, you saw a large slide in scores from one teacher and that dropped down low enough, then maybe this number would mean something but any time that you simple number has to come with an explanation – it’s not that simple anymore.

In this case, what’s worse is that the rankings were published with names. Names of teachers and names of schools. Abbott’s boss reassured her that he would still put her up for tenure but felt he had to warn her that someone else might take these rankings into account.

Abbott’s ranking doesn’t matter to her much anymore, because this teacher has now left teaching and is undertaking a PhD in Mathematics instead. Great for us at University because having good teachers who then successfully complete PhDs often works out very well – they’re highly desirable employees in many ways. Not so good for the students at her school who have been deprived of a teacher who managed to get a group of kids to the 98th percentile on Grade 7 Math, providing a foundation that will probably be with them for their whole lives (even if we quibble and it’s down to the 89th) and giving them a better start for their academic future.

But that’s ok, kids, because she was the worst teacher you’d ever have. Oh, of course, there’s another new “worst” teacher because that’s how our ranking system works. Sorry about that. Good luck, Carolyn Abbott!

There needs to be some way of measuring that shows how much growth a student has made in a school year. It would be so much clearer of a picture if you could see whether a student made a full year’s worth of growth in a subject, rather than using the ridiculous percentiles. I wouldn’t know how to design a test like that, but you’d think it would be easiest for mathematics.

LikeLike