Why All Scorers Should Subtract N-1 And Multiply By 10
News that they have set the “Doomsday Clock” backward one minute to 11:54 has made me realize how this silly tradition actually illustrates two very different pet peeves of mine.
One is obvious, the fact that Smart People like to gussy up their opinion as something quantitatively factual. This one is merely one of the more naked attempts. Basically a handful of scientists make up a number and tick it up or down every so often, according to whatever whim they feel like, and superstitiously ascribe to this number – Maya-style – supernatural powers of prophecy. Why on earth does anyone pay attention to this? Because it’s a clock and is created/updated by scientists!
On a completely different note, it’s also interesting to note that this clock has been between “11:43″ and “11:58″ the whole of its existence (since the 1940s!). This illustrates a completely different pet peeve of mine, artificial score inflation.
When faced with giving ‘scores’ of any sort, humans simply don’t seem to distribute the ‘scoring’ numbers in any sort of rational, efficient way. The most glaring place to spot this deficiency is in scoring of aesthetic sports such as diving, gymnastics or ice skating. Let’s say the ice skating scoring has a max of 10.0 points: even the worst imaginable performance will still probably get around an average 9.0 score. As long as you don’t, like, fall asleep in the middle of the performance, you basically get the first 9.0 points for free. This is a symptom of the fact that human judges, whether because it ‘feels mean’ or whatever other reason, basically are psychologically unable to use the first N-1 integers of an N-point scoring system (with decimal points). There is no rational reason why, without changing their evaluations of the relative merits of the performances, the judges giving all skaters scores between 9.0-10.0 couldn’t just (a) subtract 9 points from their initial score, and (b) multiply by 10. This would lead to final scores nicely distributed between 0-10 and easily bring out the difference between higher and lower scores. Instead judges squeeze most of their scoring up into the 9.8-9.9 range, leaving little room for differentiation of the great from the very good.
Homework/exam grading in school is another good example. The traditional scoring system laid out that 90-100 is an A, 80-90 is a B, and so forth, down to an F for <60. Why are we even using those first 50 "points", since we basically give it to all students for just showing up? Why can't graders score assigned work/exam questions so that the resulting scores end up more or less uniformly distributed between 0 and 100? I don't know, but they just can't.
In our case, these scientists have set up a "clock" but are only using a small percentage of the numbers available to them. Why? Because from the beginning, there was a psychological need to keep all the "times" close to "midnight", for maximal effect. But this is a clock, there are 12 (or 24? let's keep it at 12) hours. These guys aren't even using the times between noon and 11:40. Instead of ranging their "times" between 11:43 and 11:58 they could have starting the clock at, say, 3 p.m. The relative locations of the "times" to each other need not have been any different, but this would have given them plenty of room for finer gradations and subtle (or dramatic) moves in the "time". As things stand it's "11:54" so there are really only 5 time-slots available to signal how scared they are and whatnot. What if 5 very very scary things happen, they tick it forward to 11:59, but then the world still doesn't end? It's like the boy who cried wolf: you can only do it so many times before people stop paying attention.
If these "doomsday clock" guys can't keep themselves from inflating the currency to the point where it's basically always eleven-fifty-something no matter what happens or doesn’t happen, people will just stop paying attention to their doomsday warnings. And just think what a tragedy for the world that would be, because these warnings are so objective and substantive indeed.
When I teach, I certainly use all 100 points on the grading scale. You just need to amass enough points to pass. I assure you many students do not.
I agree with you wholeheartedly on the silliness of the Doomless clock, but I think your analogous scoring examples are a little weak, as some things just have to do with selection bias. Sometimes you don’t see low figure skating scores simply because the bad figure skaters who would earn those low scores don’t make it the Olympics or whatever figure skating competition you’re watching. It’s similar to a music site where I volunteer some reviews; I tend to give disproportionately high scores because I can’t bring myself to write about the kind of music that would get low scores.
As for grades, I think grade inflation is a real and sorry problem, but when used objectively there’s nothing wrong with the traditional A-B-C scale. The reason we don’t see very many scores between 0 and 60 is because you have to be really bad to get that low; you should fail if you can’t get more than 60% of the math questions right, and if we spread those letters out more evenly it would be a lowering of expectations. Unless of course we simply made the questions way harder. In which case it’s all relative, but maybe we just like getting those bigger numbers. Which, actually, I think may be precisely your point.
So where was I anyway. I’m not even sure now. But don’t you worry about the Smart People running out of numbers if five big events happen. Smart People will discover seconds, and we will soon be treated to the second hand of the Doomless clock moving us from 11:59:00 to 11:59:30 and finally 11:59:59, and then the Smart People will discover milliseconds, and we will be treated to 11:59:59.9 and 11:59:59.99, and so forth, and the Smart People will finally end up proving Zeno to be correct after all these years.
There may be selection bias at work but I still doubt that lower-level competitions feature scoring that is anywhere close to uniformly-distributed between 0-N. It’s more plausible that lower-level scoring is inflated, in fact; someone gets a “9.1″ (and therefore loses/is weeded out) for a performance that wouldn’t get anywhere close to a 9.1 in the Olympics, etc.
Re: grading, yes what I’m saying is that questions should be far harder, in order to get a better scattering of results. Partly this is just the lazy former teacher/grader in me. My ideal system would involve many questions, all graded out of 1 ‘point’, with no partial credit available, and tuned so that the mean is 50%. I am quite certain students would complain about this, but I’m not sure why. The more scatter there is, the less likely the difference between two students owes itself to statistical error. If two students get a 91 & 89, and you give them an A and a B resp., their different grades are likely due to dumb luck. It’s less easy to say that when it’s 85 vs 75 however because you calibrated the questions in order to better bring out their differences.
Good point about seconds on the Doomsday Clock. The instant I read that I realized you were right. It is inevitable.
best
Haha, love the bit of snark in your last sentence there.
Great essay. Good point.
Scientists have become white-coated, ie. white-robed, priests of our common, secular religion of radically socially egalitarian liberal democracy…