99% cleaning and compiling with 1% of some other stuff

There are times when you get a data set, and you can use it straight away. Actually, I can't remember when this has ever happened, so let's assume there are actually no times when you get a data set and you can use it straight away. 

There are times when you get data, and you have a lot of wrangling, cleaning, reformatting and just generally bashing the thing into shape. It's an iterative process and visualisation comes in handy to spot outliers and errors.

Then there are the times when there is actually no data, and you have to combine existing data sets with photocopies of old magazines. Thankfully in the newsroom there is a good record of those old magazines. So this is what you sometimes have to do - look for old records or where there is gaps and hand data enter the numbers. Or just check what's going on...

It was heartening to read Tim Sherratt has done the same thing with his PM speeches repository (although on a much grander leather-bound scale - mine were crappy photo copies of old BRW mags).

So here is the story, and the graphic, in all it's digital glory