The Art of Statistics: Learning from Data

by David Spiegelhalter

Wednesday, June 24, 2020

Featured image for The Art of Statistics: Learning from Data

A really interesting book, it’s definitely down the textbook-ish end of the popular science shelf but I got a lot out of it. There’s a lot about communication of statistics in here and one standout point for me was something that Spiegelhalter buries in a footnote as if it’s a bit off-topic which I don’t really think it is: The Groucho Principle. I forget his exact wording but I’m thinking of it as “if a statistic has been brought to your attention there’s probably something wrong with it”. As the numerate one among my family and friends I find I’m always the one saying “well, yes, maybe, but” to numbers in news stories, always being cynical and looking for the problems with the story being told. It’s an ingrained habit and I was doing it to the examples in this book too. There’s a section on storytelling with statistics that includes a chart showing the average age of people having their first child compared to first having sex between the 1930s and the 1990s. It’s labelled as showing how the time period in which effective contraception is required has changed over the years, but in my head it tells a story of how the availability of effective contraception has changed the way people’s lives have shifted over the years.

I’m not a statistician, I studied mathematics but found statistics very dry and boring when I was at school and university, not enough computers in it for me in the 1990s. The standard statistics teaching methods involving tossing coins and pulling coloured balls out of bags were really tedious to me, the author points this out but then continues to use them quite often. Fortunately there are a lot more examples, chiefly from medical science, that are a lot more interesting, I wish these examples had been more prevalent in my introduction to statistics courses. For example, a test for cancer is said to be 90% accurate, by which they mean that 90% of people with the cancer will get a positive test result, and 90% of the people without the cancer will get a negative result. Say that 1% of the population actually have the cancer. If I get a positive test result what’s the chance that I have cancer? I think most people would say 90% sounds like a reasonable answer but the chance is actually more like 9% because of how many more people fall into the “test positive but don’t have cancer” bucket than the “test positive and do have cancer” bucket. That kind of example is far more interesting to me than the odds of me getting two red socks out of my drawer on a dark morning.

Statistics is something I wish I had a better handle on and this book has given me a good bit more friction to hold on to. I got a little annoyed with the way some things are couched in “sorry, this is just hard” terms later in the book, it seemed to be missing the point of the book which should be to explain these things so that they can be understood. Final, probably only interesting to me, fact: I discovered in the course of reading the book, and exploring some of the topics more deeply, that David Spiegelhalter’s doctoral supervisor was my personal tutor in my first year at university. Sometimes I wish I could tell 18 year old me to pay more attention to the world she lived in, but there are only so many things you can pay attention to at one time and I seem to have done alright out of the ones that caught my eye at the time.

This was an interesting, mostly very readable, and definitely useful book.