John Appleby: Care.data—your bits in their hands

Over the past few months there has been considerable debate and argument about plans by the NHS to collect and centrally collate details of individual patient records from general practice for the first time. Many have expressed worries about the care.data initiative and how potentially sensitive patient information will be used, who will have access to it (and for what reasons), and not least its security. Such fears are perhaps not just hypothetical given past examples of lost patient notes and what appear to be the misuse of sensitive patient information (even for the best of intentions).

With examples like these (and the revelations of the scale and detail of surveillance undertaken by the US National Security Authority and the UK’s GCHQ for example), the disquiet about care.data is perhaps not surprising.

However, as not only a contributor of my own medical information as part of care.data (including other datasets built up from NHS patient records), but also as a research user of such data, I will not take up the option to opt out of care.data—an option provided despite the mandatory inclusion set out in the Health and Social Care Act. Others will take a different view of course, but for those still wondering what to do it’s worth considering the experience of another national data set—the hospital episode statistics (HES)—which has been in existence for a number of decades and which care.data is now expanding to cover out-of-hospital care.

“Big (and personal) Data” are not new to the NHS. For nearly a quarter of a century the NHS in England has collected and collated a vast array of data from patients’ hospital records—the hospital episode statistics (HES). This is collected not just for NHS patients treated in NHS hospitals, but private patients treated in the NHS and for patients treated in non-NHS hospitals, but paid for by the NHS. This data—over 125 million individual inpatient, outpatient, and A&E records each year—are warehoused and controlled by the Health and Social Care Information Centre (HSCIC). The HES Data Dictionary details all the data collected and includes not only details of patients’ diagnoses and treatment, but how long they stayed in hospital and how long they waited to get into hospital.

While the HES database does not contain patient names, it does hold other identifying information such as gender, age, patients’ full postcode, referring GP practice, and patients’ NHS number. However, before being released to researchers like me, HES replaces the date of birth with year of birth, removes the second part of the postcode, and replaces the NHS number with a unique, meaningless pseudonym so that I can still link episodes of care within and between years without seeing any of the patient’s “real world” identifiers.

At this point, for those who have never heard of HES and are worried about care.data, you may be thinking hold on, where’s my opt out for HES? Well, the reasons for highlighting HES—which is just the secondary care equivalent of the primary care data that will now be linked through care.data—is to provide some assurance about some of the worries some people have about care.data. Moreover, as the Privacy Impact Assessment for care.data makes clear, the opt out does not just apply to data from GP practices, but now to any other information flows from the Health and Social Care Information Centre.

In terms of its uses, the first set of data from HES was made available in 1989 and has since been used for various administrative purposes—such as providing essential data on the use of hospital services in different areas to allow the NHS to allocate money to reflect the different needs of different populations. This has helped ensure the NHS strives to live up to one of its founding principles to provide equal opportunity of access to those in equal need. Latterly, it has also been used to ensure hospitals are paid correctly (via payment by results) for the work they do. HES has also enabled the NHS and researchers to investigate important policy issues such as the (variable) performance of hospitals based on deaths in hospitals, length of stay, or the rates of readmission of patients. Without the comprehensive coverage HES provides, research to evaluate the impact of competition on patient outcomes would not have been possible. And further, the detail contained in HES has enabled researchers and others to investigate whether and to what extent there are inequalities in the use of hospital services. Do populations in economically deprived areas receive or use services more or less than other areas for instance?

Some of these examples of the uses to which HES data has been put do not require a full census of patient data (they could have relied on a sample) contained in HES. On the other hand, other examples (such as the payment of hospitals) really do require as comprehensive a data set as possible. And for some research examples, such as the impact of competition between hospitals, some extra details (eg the first part of patients’ postcodes, the referring practice etc) are essential to the analysis.

HES data also has direct clinical uses. An analysis using HES data was able to identify the rogue obstetrician Rodney Ledward as an outlier relative to his peers. This research raised the possibility of using routine HES data as an early warning system not just for exceptional cases such as Ledward, but to identify poorly performing medics. HES data also provided the basis for studying and identifying which patients were at risk of readmission to hospital and then to tailor services to reduce such inappropriate admissions. Both these examples of the clinical use of HES required access to pseudonymised patient information.

But do these or any other examples of the use of HES justify the risks of collecting and collating such large volumes of patient data? There is no straightforward or simple technical answer to this question as it depends in part on factors almost totally immune to facts or evidence—such as individuals’ attitudes and perceptions of risk, their views about, and trust in, government in general or particular governments. Others may hold ideological objections to the use of data like HES by commercial organisations either on the basis that they think no one should profit from “adding value” from the use of what they see as “public” data or because they feel such organisations may be more motivated to in some way misuse the data than others (such as academic researchers) with “purer” motives.

Some fear that medical insurers would be able to access confidential data, identify individuals, and then use this information to, say, refuse to insure or demand high premiums. However, it is unclear why insurers would want to do this when full disclosure of pre-existing medical conditions is in any case requested by insurers as a condition of offering insurance.

It may be that for some, no benefit is worth the existence of even the tiniest of risk of, for example, the identification of an individual patient or the accidental leakage of confidential data. And ultimately, no absolute guarantee can be given that confidential data might not escape either by accident, incompetence, and human error, or maliciously as a result of some criminal action. This is the rock and a hard place NHS England find themselves with care.data.

However, the experience of HES should, I think, give those who want to opt out of care.data pause for thought. HES has provided significant benefits in terms of the running of the NHS and analysis of its activities and performance which ultimately and directly benefit patients. Moreover, as far as I am aware—and I stand to be corrected—there have been no recorded examples in its history of data loss leading to patient or public harm, or the actual (as opposed to theoretical) identification of individual patients (through linking to other non-HES data for example). This must in part be due to the rules and processes governing access to, and use of, the identifiers contained in HES—rules which will not only apply with care.data, but which have been strengthened.

So, should you opt out of care.data? I strongly think not. And while some have argued for an “opt in” approach, through inertia alone this would almost certainly jeopardise the whole point of the exercise. Just like the practice of medicine, the collection, collation, and use of medical data will not be 100% risk free. But as the operation of HES has shown over the last quarter of a century, the risks can be minimised (to zero it would seem) to allow us to enjoy the benefits.

John Appleby is the chief economist at The King’s Fund.

Information for Authors