Emiliano De Cristofaro: Genome data can never be fully anonymised

Security issues that are specific to genome data need to be considered

Recently Sally Davies, England’s Chief Medical Officer, called for the NHS to deliver her “genomic dream” within five years, and make whole genome sequencing “as standard as blood tests and biopsies.” A large number of patients in the UK already undergo genetic testing at least once in their life, and for a wide range of reasons—for example, screening for cancer predisposition due to high family incidence, or determining the best course of action in cancer treatment.

Whole genome sequencing (WGS) is used to determine an organism’s complete DNA sequence. But it is not the only way to analyze our DNA. In fact, genetic testing has been used in clinical settings for decades, e.g. to diagnose patients with known genetic conditions. However, the cost of WGS is dropping fast and the availability of affordable whole genome sequencing not only prompts new hopes toward the discovery and diagnosis of rare/unknown genetic conditions, but also enables researchers to better understand the relationship between the genome and predisposition to diseases, and responses to treatment.

Overall, progress makes it increasingly likely that in the not-so-distant future individuals will undergo sequencing once and make their digitized genome easily available for doctors, clinicians, and third-parties. This would also allow us to use computational algorithms to analyze the genome as a whole, as opposed to expensive, slower, targeted in-vitro tests. There are a number of important caveats that need to be considered.

Firstly there are security concerns prompted by the need to store sensitive data like genome data. The genome contains information about ethnic heritage and predisposition to certain diseases and conditions. Data breaches of sensitive information, including health and medical data, sadly happen on a daily basis. But certain security issues are specific to genome data and are much more worrying. For instance, due to its hereditary nature, access to a genome essentially implies access to the genome data of close relatives as well, including offspring. One’s decision to publish genome data is also being made for one’s siblings, children, grandchildren and so on. So sensitivity does not degrade over time, but persists long after a patient’s death. In fact, it might even increase, as new aspects of the genome are studied and discovered. As a consequence, Sally Davies’ dream could easily turn into a nightmare without adequate investment in sound security measures, that involve both technical tools (such as upgrading of obsolete hardware) as well as education, awareness, and practices that do not simply shift the burden onto clinicians and practitioners, but incorporate security in their design.

Secondly there are concerns with allowing researchers to use the genome data collected by the NHS, along with medical history, for research purposes—e.g. to discover genetic mutations that are responsible for certain traits or diseases. This requires building a meaningful trust relationship between the NHS, government and patients, which cannot happen without healing the wounds from recent incidents like the care.data debacle or Google DeepMind’s use of personal NHS records. Instead, the annual report seems to include promises around security and anonymity that we cannot realistically maintain, while, worse yet, promoting a rhetoric of greater good trumping privacy concerns, as well as seemingly pushing a choice between donating data and access to the best care. It is misleading to use terms like “de-identification” of genome data as an effective protection tool. Proper anonymization is inherently impossible due to the combination of unique and hereditary features of a genome. Rather, we should make it clear that data can never be fully anonymized, or protected with 100% guarantees.

Overall, I believe that patients should not be automatically enrolled in sequencing programmes. Even if they are given an option to later withdraw, once the data are out there it is impossible to delete all copies of them. Rather, patients should voluntarily decide to join through an effective informed consent mechanism. This may prove to be challenging against a background in which the information that can be extracted/inferred from genomes may rapidly change.

Encouraging results with respect to education and informed consent, however, do exist. For instance, the Personal Genome Project is a good example of effective strategies to help volunteers understand the risks and could be used to inform future NHS-run sequencing programmes.

Emiliano De Cristofaro is a Senior Lecturer in the Computer Science Department at University College London and the Program Director of UCL’s MSc in Information Security. His research interests include privacy technologies and cybersecurity. Dr De Cristofaro received his PhD from the University of California, Irvine in 2011 and a BSc from the University of Salerno (Italy) in 2005.

Competing interests statementEmiliano De Cristofaro currently receives research funding from the EU, EPSRC, the Royal Society, and Google.