Good Data

by Sam Gilbert

Friday, November 1, 2024

Featured image for Good Data

This book is a pushback against what seems to me to be the prevailing current viewpoint that our data is being stolen from us and misappropriated by companies for sale and profit at every opportunity. We generate vast quantities of data all the time and occasional misuse of it - eg the Cambridge Analytica scandal - or experiments with it that don’t quite pan out - eg Google Flu Trends - end up with us not wanting to share our data any more. The author’s arguing, partly, that a lot of what we’re protecting is not worth protecting but also that it should be used ethically and transparently.

These days I use a search engine that keeps my search terms private because I don’t fancy anyone else looking through the random stream of consciousness that I type into it each day and I think it’d be pretty easy to identity me if you had that stream even though I don’t think it’s at all likely any one would ever even try. I do think there’s a use for the consolidated data of all our search terms though and I agree with the author that it’s a pity that the tools for doing good stuff with that have been lost as a result of misuse and misunderstandings. The fact is I don’t trust a company who collect that data to do the good stuff and not the creepy stuff and I’m not alone in that. And online advertising too: I quite like the way the advertisers know to advertise stuff that people-like-me like to me so that I find cool stuff I might enjoy popping up rather than endless tediously generic adverts such as I get if I watch broadcast tv. But I don’t want that power used to enure me to extremist politics for example. It’s easy to press ‘please don’t track me’ these days but hard to know when you wouldn’t actually mind it and when you’re being tracked anyhow no matter what you say.

As I was reading I kept remembering the big data project of Ben Goldacre’s that I was reading about a few years ago. His idea - as I remember it - was to share all our NHS health data with researchers in safe and traceable ways and I thought the idea to get all this data to researchers to improve our health with vast quantities of evidence was a great idea. However I remember at that time that there being a backlash of people not wanting their health data shared because they felt it was an invasion of privacy and I felt that there was a general failure of communication over how useful large quantities of data can be. (OpenSafely is, I think, the project I was reading about before.)

It’s a well written and very readable book; and that I remember more about my own trains of thought I went off on while reading it than the actual content of the book is a product of the fact that it made me think.