What does data privacy and its commercialisation mean for global health?

In the world where digital health is becoming the norm – what does data privacy and commercialisation mean for global health ? writes Michael Johnson

Imagine you are an activist working to fight corruption within your own government. For ten years you have been receiving regular HIV treatment, but your HIV status is known only to your closest friends and family. Recently your local health facility began collecting fingerprints at the registration desk. Your fingerprints are not yet on file, but you are frightened about what would happen on your next clinic visit. You ask a close friend who works in the hospital archives, and she informs you that the fingerprinting requirement came from the United States government, for data verification purposes.

Would your fingerprint be shared only with the health ministry or also with law enforcement agencies? Would your life be in more danger by stopping treatment, or by reporting to the health facility to be fingerprinted? This scenario may already be a reality for people living with HIV in Kenya, where the US Government has reportedly made PEPFAR HIV treatment funding contingent upon the implementation of a biometric program, despite opposition by civil society groups.

The risks posed by sharing health and biometric data must be addressed directly by the global health community. And the time is now. In this article, I will highlight three cross-cutting issues that affect health systems in rich and impoverished countries alike: the issue of untested digital health interventions, expansive data sharing, and that data can never be de-identified.

Untested digital health interventions

Every two years, the International Medical Informatics Association hosts the World Congress on Medical and Health Informatics, known as MedInfo. Hosted in Lyon, France, researchers at the 2019 conference

showcased new partnerships that will enable academics and policy makers to manage data ownership, privacy, and informed consent at a global scale. The EU Innovative Medicines Initiative will invest billions of euros to establish public-private partnerships to advance personalized medicine, big data analytics, machine learning and artificial intelligence.[i]

By leveraging an expansive research infrastructure, academic institutes and private sector partners will take advantage of data in electronic health record systems (EHRs). Some predicted ground-breaking scientific achievements would be unlocked by sharing data more efficiently, analysing it more rapidly, and influencing health outcomes with predictive (AI) algorithms. This utopian vision of the future sounded appealing, but was soon dismantled when I began to unearth the role of private corporations in this space.

Many researchers know Elsevier as a global powerhouse of academic publication, but it is part of a larger information analytics enterprise generating over £7 billion in revenue from a multitude of business lines.[ii] The company provides research support for pharmaceutical and life science companies during all stages of drug discovery and development, clinical guidelines and decision support for oncology care (via ClinicalPath), and a deeply-embedded EHR order entry solution (Order Sets).  Elsevier was the only multinational corporation with a booth at the MedInfo conference promoting the benefits of these tools; however they are just one player in a global landscape of private corporations capitalizing on our private health data.[iii]

Elsevier’s Order Sets is one of many software products providing Clinical Decision Support (CDS) to physicians. The value proposition is persuasive – already overwhelmed and burdened by administrative duties, clinicians do not have time to stay up to date on the latest academic research. If hospital committees establish a set of “recommended orders,” based upon the national guidelines and research publications, these recommendations could be shown to physicians as a single, cohesive list of “orders” (e.g. drug prescriptions, diagnostic imaging, dietary restrictions, and exercise recommendations). A meticulous clinician may dig deeper to discover links to academic citations and guidelines, while others will simply submit the recommended orders and move on.

EHR integrations and support tools are often designed to create a two-way information flow – guiding clinician behaviour in one direction and cataloguing patient outcomes in the other. These new software tools are “digital health interventions” that could have the potential to harm patients or save lives. If new medicines must be tested in rigorous clinical trials, we should also demand that new digital health interventions and AI algorithms are never tested on patients or clinicians without their knowledge or consent. Interventions should be assessed for impacts such as racial bias, interruption of clinician workflow, impact upon clinicians’ mental workload, and measurable changes in health outcomes.

Data privacy - global health

Expansive data sharing (and profit)

If EHR tools are designed and implemented by private corporations far removed from actual patient care, these interventions can lead to the evisceration of digital health privacy. The Wall Street Journal recently reported Google’s “Project Nightingale” has amassed health records from a healthcare network of millions of patients across 2,600 facilities in 21 states in the United States. Google engineers have not yet built a new tool to integrate with the EHR, instead they were given direct access to many patient-level EHR datasets. Although the deal was not done “in secret”, it made for sensational headlines because it was reported that patients and physicians were not fully informed about their data being shared with Google.

Unlike clinical research, where investigators have a duty to protect patients under their care, private companies may not have a duty to inform clinicians or their patients of dangers or warning signs that are discovered in their personal clinical data. Could it be worth millions of dollars to a pharmaceutical company to observe an increase in side-effects in a medication that is at risk of being recalled by the European EMA or US FDA? If some ethnicities or genotypes respond poorly to a newly patented drug regimen, will patients be informed? Will providers be discouraged from prescribing it or will the hospital algorithms be tuned to maximize profit margins? Health spending already accounts for 17% of GDP in the US – we must not allow this valuable resource to become wholly owned and managed by private research corporations in a position of knowledge asymmetry.

Data that can never be de-identified

Patients share deeply personal and private information with their healthcare providers with the expectation that it will not be shared outside the examination room.[iv] If third parties or government agencies gain access to this data, the results could be disastrous. Personally-identifiable health information collected in centralized national (or international) databases are prime targets for identify theft, extortion, or persecution of individuals. Imagine the consequences of a data breach that exposes the identities all sex workers living with HIV, together with their biometric identifiers and the geolocation of their home address and the name of their emergency contact. This is just one potential data privacy consequence of massive data sharing; there are many more.

Worse still, re-identification of individuals in “de-identified” data sets is now commonplace, such that “anonymizing” health data may already be impossible. Even when identifiers such as date of birth or home address are removed, individuals can be re-identified using data procured from other third-party datasets easily available. Even more concerning, individuals who have their genomic profiles recorded are irreversibly identified in biobanks, ancestry registries, or government databases. With datasets being bought and sold multiple times, those data are duplicated over and over again, making the likelihood of ever being “forgotten” incredibly low or non-existent.

Concerns about health data privacy and commercialisation are not intellectual or academic exercises; these are matters of life-and-death. Given the dangers posed by integrating disparate data sources and enabling private health data marketplaces, it is imperative we do no harm.  If lead health innovation with “blind tech-evangelism”, we will be putting most vulnerable individuals in harm’s way; The “in” group will continue to reap the benefits from health advances and the “out” group will continue to be deprived of the care they deserve.

Patients must give their explicit informed consent before their personal health data can be shared or sold, especially when their data is biometric or genomic. Our health systems already rely on vast and expansive data sharing, but the right to health and the right to privacy are ethical and moral issues that cannot be ignored because we lack the courage to fight the good fight.

If the health sector earnestly seeks to pursue ethical digital innovation, companies should develop policies that protect people and digital identity frameworks like Good ID. The digital identity principles of privacy, security, and user-control are a helpful lens to frame the concept of digital health identity.

If we believe health is a human right, then protecting health data is a human rights issue.

About the author :

Michael Johnson (MSc VU University, Amsterdam; MA Fuller Seminary, Seattle) lives in Kansas City, MO and works on the monitoring and evaluation team of an international NGO, Partners In Health. You can follow him on Twitter at @migueljohnson.

Competing interest

I have read and understood the BMJ Group policy on declaration of interests and declare I have no conflicts of interests.


[i] European Health Data and Evidence Network EHDEN, Switzerland’s BiomedIT and Personalized Health Networks, the Netherlands’ Health RI).

[ii] Elsevier is part of a larger information analytics group, RELX, which also owns LexisNexis, a company specializing in legal research, business research, and a Socioeconomic Health Score that helps health insurance companies predict health risk of new enrollees without claims and pharmacy data. Their dataset is massive in scale: “LexisNexis receives behavioral updates on 279 million unique U.S. identities every year. The LexisNexis data repository includes 45 billion public and proprietary records as well as some alternative data. The repository is refreshed on a regular basis, with 77 million records processed daily.” Resource Accessed from: https://www.lexisnexis.com/risk/downloads/literature/health-care/socioeconomic-data-coverages-br.pdf. Resource Archived at: https://perma.cc/P3AF-AD6T

[iii] This list includes EHR vendors, supply chain and logistics vendors, pharmacies and pharmacy benefit managers, claims managers and payment processors, wearable device manufacturers, laboratory technology manufacturers and laboratory information system vendors, cloud computing vendors, population health management solution providers, credit reporting agencies, and more. For a diagram highlighting the complexity and a sample data sharing flow, visit https://medium.com/datavant/the-fragmentation-of-health-data-8fa708109e13. Resource Archived at: https://perma.cc/JKY6-V4DX and image diagram archived at: https://perma.cc/Y26M-ZSWQ

[iv] Examples may include: family history, sexual history, stigmatized conditions or activities, work history, criminal history, pre-existing health conditions that could nullify future health insurance coverage, et cetera.

(Visited 643 times, 1 visits today)