Is It Called Ranking Because It Smells Funny?

Years ago, I was a professional winemaker, which is an awesome job but with very long hours (that seems to be a trend for me). One of the things that we did a lot in winemaking was to assess the quality of wine to work out if we’d made what we wanted to but also to allow us to blend this parcel with that parcel and come up with a better wine. Wine judging, for wine shows, is an important part of getting feedback on the quality of your wine as it’s perceived by other professionals. Wine is judged on a 20 point scale most of the time, although some 100 point schemes are in operation. The problem is that this scale is not actually as wide as it might look. Wines below 12/20 are usually regarded as faulty or not at commercial level – so, in reality, most wine shows are working in the range 12-19.5 (20 was relatively rare but I don’t know what it’s like now). This gets worse for the “100 point” ranges, where Wine Spectator claim to go from 50-100, but James Halliday (a prominent wine critic) rates from 75-100, where ‘Good’ starts at 80. This is really quite unnecessarily confusing, because it means that James Halliday is effectively using a version of the 16 available ranks (12-19.5 at 0.5 interval) of the 20 point scale, mapped into a higher range.

Of course, the numbers are highly subjective, even to a very well trained palate, because the difference between an 87 and an 88 could be colour, or bouquet, or flavour – so saying that the wine at 88 is better doesn’t mean anything unless you know what the rater actually means by that kind of ranking. I used to really enjoy the wine selections of a wine writer called Mark Shields, because he used a straightforward rating system and our palates were fairly well aligned. If Mark liked it, I’d probably like it. This is the dirty secret of any ranking mechanism that has any aspect of subjectivity or weighting built into it – it needs to agree with your interpretation of reasonable or you will always be at odds with it.

In terms of wine, the medal system that is in use really does give us four basic categories: commercially sound (no medal), better than usual (bronze), much better than usual (silver) and “please pass me another bottle” (gold). On top of that, you have the ‘best in show’ effectively which says that, in this place and from these tasters, this was the best overall in this category. To be frank, the gold wines normally blow away the bronzes and the no-awards, but the line between silver and gold is a little more blurred. However, the show medals have one advantage in that a given class has been inspected by the same people and the wines have actually been compared (in one sense) and ranked. However, if nothing is outstanding then no medals will be awarded because it is based on the marks on the 20 point scale, so if all the wines come in at 13, there will be no gongs – there doesn’t have to be a gold or a silver, or even a bronze, although that would be highly unusual. More subtly, gold at one show may not even get bronze at another – another dirty little secret of subjective ranking, sometimes what you are comparing things to makes a very big difference.

Which brings me to my point, which is the ranking on Universities. You’re probably aware that there are national and international rankings of Universities across a range of metrics, often related to funding and research, but that the different rankings have broad agreement rather than exact agreement as to who is ‘top’, top 5 and so on. The Times Higher Education supplement provides a stunning area of snazzy looking graphics, with their weightings as to what makes a great University. But, when we look at this, and we appear to have accuracy to one significant figure (ahem), is it significant that Caltech is 1.8 points higher than Stanford? Is this actually useful information in terms of which university a student might wish to attend? Well, teaching (learning environment) makes 30% of the score, international outlook makes up 7.5%, industry income makes up 2.5%, research is 30% (volume, income and reputation) and citations (research influence) make up the last 30%. If we sort by learning environment (because I am a curious undergraduate, say) then the order starts shifting, not at the very top to the list, but certainly further down – Yale would leap to 4th in the US instead of 9th. Once we get out of the top 200, suddenly we have very broad bands and, honestly, you have to wonder why we are still putting together the numbers if the thing that people appear to be worrying about is the top 200. (When you deem worthiness on a rating scale as only being a subset of the available scale, you rapidly turn something that could be considered continuous into something with an increasingly constrained categorical basis.) But let’s go to the Shanghai rankings, where Caltech is dropped from number 1 to number 6. Or the QS World Rankings, who rate Caltech as #10.

Obviously, there is no doubt about the general class of these universities, but it does appear that the judges are having some difficulty in consistently awarding best in class medals. This would be of minor interest, were it not for the fact that these ratings do actually matter in terms of industry confidence in partnership, in terms of attracting students from outside of your home educational system and in terms of who get to be the voices who decide what constitutes a good University. It strikes me that broad classes are something could apply quite well here. Who really cares whether Caltech is 1, 6 or 10 – it’s obviously rating well across the board and, barring catastrophe, always will.

So why keep ranking it? What we’re currently doing is polishing the door knob on the ranking system, devoting effort to ranking Universities like Caltech, Stanford, Harvard, Cambridge and Oxford, who could not, with any credibility be ranked low – or we’d immediately think that the ranking mechanism was suspect. So let’s stop ranking them, because it’s compressing the ranking at a point where the ranking is not even vaguely informational. What would be interesting was more devotion to the bands further down, where a University can assess its global progress against its peers to find out if it’s cutting the mustard.

If I put a bottle of Grange (one of Australia’s best red wines and it is pretty much worthy of its reputation if not its price) into a wine show and it came back with less than 17/20, I’d immediately suspect the rating system and the professionalism of the judges. The question is, of course, why would I put it in other than to win gold medals – what am I actually achieving in this sense? If it’s a commercial decision to sell more wine then I get this but wine is, after all, just wine and you and I drink it the same way. Universities, especially when ranked across complex weighted metrics and by different people, are very different products to different people. The single figure ranking may carry prestige and probably attracts both students and money but should it? Does it make any sense to be so detailed (one significant figure, indeed) about how one stacks up against each other, when in reality you have almost exponentially separated groups – my University will never ‘challenge’ Caltech, and if Caltech ‘drops’ to the graded level of the University of Melbourne (one of our most highly ranked Unis), I’m not sure that the experience will tell Caltech anything other than “Ruh oh!”

The Scooby Gang, stunned that Caltech was now in the range 50-100.

The Scooby Gang, stunned that Caltech was now in the range 50-100.

If I could summarise all of this, it would be to say that our leader board and ranking obsession would be fine, were it not for the amount of time spent on these things, the weight placed upon what is ultimately highly subjective even in terms of the weighting, and is not clearly defined as to how these rankings can be used to make sensible decisions. Perhaps there is something more useful we could be doing with our time?

3 Comments on “Is It Called Ranking Because It Smells Funny?”

  1. It is worse than that. Have you looked at how the Times defined “learning environment”? It isn’t anything sensible for undergrads (like having small classes, easy access to faculty, and undergrad involvement in research), but is having lots of grad students! The institutions I know that have lots of grad students generally have much worse learning environments for undergrads. To use your metaphor of wines, it is like giving lots of extra points for natural corks. Sure it is possible to sell good wines using natural corks in the bottles, but it is a lot harder to keep them good.


    • nickfalkner says:

      Yes, sadly, I’m aware that the Times rankings don’t really measure much that I would really be interested in as a starting student, but it’s an important point to emphasise so thank you.

      The cork/stelvin thing is a very valid point because there was a huge argument when I started working in making Riesling wines. Any number of people are happy to say “Oh, I’ve never had a bad wine with a cork” even back when we knew that the failure rate was as high as 1 in 10 for certain batches – and even when, as a professional nose, I was running across 1 in 20 which had something off about them. People would selectively purchase wine under cork, because it was ‘better’, despite the fact that they were increasing the chances of wasting their money.


  2. […] of assessment tasks into criticism, evaluation and ranking. I’ve also made earlier (grumpy) notes about ranking systems and their arbitrary nature. One of the interesting talks I attended yesterday talked about the […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s