Mary E Black: How data science will change public health

maryeblack copyWe are living in a perfect storm: vast amounts of data and rapidly increasing, cheap computing power. The world is shifting towards basing decisions even more on data. I believe, to paraphrase Billy Bosworth, that “10 years from now when we look back at how this era of big data evolved . . . we will be stunned at how uninformed we used to be when we made (public health) decisions.” (Billy Bosworth, DataStax CEO, 2015)

This is groundbreaking stuff. Bring on commercial partnership, academic links, local activist groups, and hackers, for we cannot do this alone, we do not have the firepower. Here are my top 10 change predictions. All of them are already happening . . . somewhere . . .

1. It will make you think differently: A public health data scientist is someone who excels at analysing data, particularly large amounts of data, to help improve the health of the public. The evolution is in the range of skills and tools that can be employed, and the size of the data that can be analysed. With data science, you can apply a different set of tools to look for patterns in much bigger and messier data. Those patterns may be anything from ridiculous to unexpected, and the appliance of public health intelligence is to then work out what is sensible.

2. Predictive data: To date we have largely worked with retrospective data, looking back and then projecting forwards from that. What if the loop was changed and prediction could be modelled in real time? This could make us think completely differently about population censuses, or for working out how many school places are needed by mashing up many sources of local data to get a better answer.

3. It’s fast: When the first genome was read it was expected to take 15 years but, in fact, it took 13. Now an entire human genome can be sequenced in less than a day and can be purchased online.

4. New partnerships. New skills: There is just no way out of this one—as the frontiers of data science become familiar territory, all public health professionals will have to learn a new lexicon and understand how big data, machine learning, and artificial intelligence will impact on the art and science of public health. If you think you are a good epidemiologist now, imagine how you will be challenged to work with real time, unstructured data. If you think you are a competent demographer, I wonder how long before we find an alternative way to conduct the expensive national census of the population. If you struggled with statistics (my own nemesis, preceded only by the clotting pathway in medical finals) then imagine how you are going to enjoy coding—for coding we will all have to tackle some time. I recommend you start reading New Scientist, get familiar with real time data visualisation, and start coding.

5. Where we load data: Right now we produce apps, or web portals, and then spread data outwards, hoping they get picked up and used. According to Mary Meeker, who produces an influential annual internet trends report, the future lies in “Buy Buttons,” where people access what they need from wherever they are—it may be Facebook, Twitter, Google, etc. In public health we will have to work out how data can be rapidly pulled from a wide range of entry points and we will not be able to dictate the terms, we will have to adapt or be ignored.

6. Competition: Public health data have for many years been produced by government or academics in tried and trusted formats. We now compete in a blizzard of facts and figures. Data sets, as they become more open, can be turned into dazzling visual formats, attractive apps, and user friendly materials. The growth in the private sector is rapid.

7. Rapid increase in data sources: The landscape of where we can collect data from is rapidly changing. No longer can we rely on the usual suspects of reported and collected data—now data can be gathered and “scraped” from many sources. Data can also be compared, and inconsistencies become more obvious—they can even become the main story. Data will be asynchronous and messy, and we will be given very little space to sit on data and groom them till they are clean and perfect.

8. Data markets: Someone, somewhere assumed people did not own their own data. What if that was turned on its head: what if there was a revolution and people decided to “marketise” their own data, just as they do with their homes on Airbnb? Could we see a value placed on people’s data from personal devices? Could there be a new breed of data brokers, the Ubers of the data world? This is not so far away—data gathered from health services are currently sold and exchanged—so what would happen if control went back to the individual?

9. Critical appraisal: We need to be on the front foot here, for as public health specialists we are well placed to critically appraise all the fanciful rubbish that can be produced with this dazzling computer power. Which criteria and methods in scrutinising results will help us do that? Which are no longer usable in this new world?

10. And finally . . . Your kids will think you are cool: Mine do anyway. Having data science in your job title is quite a bit funkier than public health specialist, demographer, or data analyst. Bean counters everywhere should celebrate—our time on the big stage has come. Our voices, brains, and judgment are needed: it’s time for public health folks to get Big Data literate, stand up, and be heard.

Mary E Black is a medical doctor currently seconded as senior adviser in data sciences to Public Health England. She is on Twitter @DrMaryBlack and

Competing interests: I have no relevant interests to declare.