Skip to main content

PHE Data Week: Big data, data science and public health

Posted by: , Posted on: - Categories: Data blog, Public health data


Welcome to a week of activity - focussing on data - promoted across our social media channels. #PHEDataWeek looks to shine a light on data and move beyond the buzzwords to talk in depth about the importance of data to health protection, prevention and care in England. It really is at the heart of everything of we do.

Indeed, I write this having recently watched the BBC Horizon programme on The Age of Big Data about how mathematical models and algorithms are being applied to large datasets to predict crime, tailor healthcare, or indeed make money (‘data is the new oil’).

There is no doubt that ‘big data’ is a buzzword and there's a lot of hype and expectation about how it's going to transform our lives.

And the volumes of data available are big – mind bogglingly so. Most of us will have a computer with, say, a 1 terabyte hard drive. It is said that 2.5 exabytes of data are generated per day – that is 2.5 million laptops worth a day – and it's growing exponentially.

We have so much data because we now generate, ‘datify’ and quantify more in our daily lives through apps, wearables, remote sensing, transactions and so on, and we are capable of storing it so it can be utilised and analysed.

The growth of data is outstripping Moore’s Law (the growth in computing power). It’s not just the size - it’s the variety and complexity of the data that is challenging.

To cope with this challenge there have had to be computing and technology developments along with developments in analysis and this has seen the emergence of data science.

What is data science?
The term was coined by William Cleveland in 2001 to describe an academic discipline bringing statistics and computer science closer together. Now data science and its application is a hot field.

You cannot do big data without data science but you can do data science without big data…and data only needs to be as big as it needs to be for the problem we are trying to solve. John Snow did not need much data to stop the cholera outbreak in Soho.

John Snow did not need much data to stop the cholera outbreak in Soho.

Big data and data science are already influential in some areas of medicine like cancer, genomics and communicable disease, but not in broad public health. There is very little written about public health data science or public health informatics. The government is beginning to realise the value of data for public policy making with the Government Data Science Partnership and its appointment of a Chief Data Officer.

Data science brings a “data first” approach, strong emphasis on analysis – seeing what the data says - and some new techniques for looking at our data – predictive analytics, machine learning, dimensionality reduction to name a few.

It provides techniques for analysing things, which are not usually thought of as data, and for coping with very large amounts of data. There are two approaches – start with a question or problem and try to solve it with data; start with the data and try and extract insight and intelligence.

So how will data science help public health and what are we doing in PHE? Here are 4 suggestions;

  • Get it: Make sure we can access all the data we need to improve public health and manage the public health system, in whatever form the data is and wherever it sits. PHE collects data through its registration services like the national cancer registration service and through programmes like screening and immunisation. Moving forward we need to take advantage of big data generated by ubiquitous technology such as wearables to get better understanding of risk behaviour then develop and evaluate new interventions.
  • Analyse it: Make sure we are turning data into insight through exploration and analysis, addressing important questions. We do this at PHE through our health intelligence networks who work with partners across the sector from cancer to mental health to produce analysis that forms the basis of reports, tools and academic journal articles amongst many other things.
  • Use it: Turn our analyses into decision support, which we do through our policy teams, recently seen in our work on e-cigarettes and data products, which are tools for change. Our tools do not just profile areas and illness but actively help health care professionals, from local authority employees to NHS commissioners, make decisions and change what they do.
  • Govern it. Take an open, ethical, pragmatic and proportionate approach to privacy and governance. Quality assure what we do. Embrace the ideas behind reproducible research, open data and open science. We work closely with partners like ONS and HSCIC to ensure our data is available to those who need it but is handled in a way that meets these criteria.

Please do engage with #PHEDataWeek. We are on hand to answer questions and build new relationships. To paraphrase Matt Damon in his recent movie The Martian, ‘let’s science the hell’ out of this data.

Chart: Farcaster at English Wikipedia, via Wikimedia Commons

Sharing and comments

Share this page