Reducing barriers to data access for research in the public interest—lessons from covid-19

The covid-19 outbreak has sparked increased awareness of the importance of timely, system-wide data for examining trends and modelling different scenarios to inform policy response. [1-5] The scale and speed of data access and use has been unprecedented in public health history. Pre-print articles sharing results before peer review have proliferated (with implications for research quality) and over 500 vaccine and treatment clinical trials have been initiated in record time. [6-8] The entire economy of knowledge production related to covid-19 has been accelerated, with the understanding that, if we wait for perfect information before acting, we will be too late. Covid-19 is providing valuable lessons on improving data access and the importance of using data for efficient and effective service response.

This situation contrasts sharply with the cumbersome processes usually faced by researchers using administrative (or routinely collected) health data to inform policy making on other topics, resulting from systems that are not purpose-built for research and summarised by four key obstacles.

First, the cost of using administrative data is prohibitive. For example, a non-commercial license for GP data through the Clinical Practice Research Datalink costs £75,000, and roughly double with linked socioeconomic and hospital data. [9,10] Second, there are lengthy approval processes (up to one year) even for de-identified data posing little risk to confidentiality: researchers are required to demonstrate scientific quality and public benefit in applications to data providers and governance bodies, even when these important aspects have already been assessed by peer review and funders. While appropriate governance is important for protecting confidentiality and preserving public trust, approval processes are not streamlined and timelines do not reflect expectations of the public. [11,12] Access to UK-wide data is particularly problematic due to different approval processes in different countries. Third, standard datasets are finalised several months after the time period covered, and inefficiencies in releasing data to researchers mean that it can take many months to receive them. [13] These delays hinder the rapid production of results to inform policy in a timely way. Lastly, data access is inefficient: most data providers mandate the hosting of data in specified secure settings, often involving travel outside of usual research environments, with limited computing capacity, restricted hours and software. [14] All these obstacles become exponentially greater for cross-sectoral linkage of administrative datasets, for which clear legal pathways for access may not exist. [15-17] 

Pre-covid-19, these problems caused substantial delays to analysing and reporting results on research in the public interest – delays which have been exacerbated since the start of the pandemic, due to the divergence of resources from non-covid-related areas. Important research simply is not done when access is refused or when timelines jeopardise grant funding. Considerable opportunity costs are associated with non-use of health data and delayed evaluations of public programmes, leading to a lack of evidence to inform more effective and equitable services, and to save lives (as well as money).18 

Covid-19 has highlighted the fundamental limitations of existing systems, and has sparked innovation for supporting data access. For example, the need for approval under Regulation 3(4) of the Health Service Control of Patient Information Regulations 2002 has been suspended by the Secretary of State for Health and Social Care for specific covid-19 related research projects, as the public benefit from this research is clear. [19] The Office for National Statistics has enabled temporary remote access to data during the covid-19 lockdown, exercising additional flexibility within the scope of regulations, albeit with logistical challenges. [20] Existing research studies such as UK Biobank have been granted new access to data sources. [21] However, these changes are too little and too late. When we reach the “new normal,” we should not return to business-as-usual, but instead take heed of lessons learned during the pandemic and rebalance the public benefits of wider data use against numerous existing barriers. We recommend the following measures:

  1. Reduce costs of administrative data access to researchers through core government funding for data processing, linkage and curation (avoiding cost-recovery models). This would enable more researchers to address questions in the public interest. This is already possible in some sectors, as demonstrated by the Department for Education for England and Wales, and in Sweden, where two thirds of MONA data system costs are centrally funded. [22]
  2. Simplify approval processes for de-identified data access through standardised guidance on necessary approvals proportionate to identification risk. Approval processes should be streamlined across organisations, including for demonstration of public benefit. [23] 
  3. Reduce data release delays through increased capacity and more specialised data providers. Independent, accredited data providers should be created, with expert processing and disseminating capacity, knowledge of how data are used in research, and understanding of how best to prepare and deliver datasets to researchers (emulating the successful Secure Anonymised Information Linkage (SAIL) Databank in Wales). [24,25] Innovations that have allowed more timely data release during covid-19, such as the OpenSAFELY collaborative or more frequent releases of GP and hospital data, should continue and be made available to researchers to allow timely research on many topics. [26] Timely data release should not compromise quality, and organisations providing data should adhere to transparent and efficient response times. [15]
  4. Enable more efficient data use through remote systems that comply with data protection requirements. [27] E-infrastructure must be improved to enable rapid data extraction and analysis. 

In addition, better data collection should be established for community services and social care, and household-based cohorts, among others. [28,29] This would have facilitated tracking of transmission patterns during covid-19, and is equally important for a range of other public health topics.

Underpinning all the above, public trust and understanding is essential if researchers are to continue to use administrative data, and we should harness the surge in realisation of the value of data for decision-making resulting from covid-19. [30] Public engagement and involvement should be included “by design and default” within systems for data access, via individual research projects and high-profile national engagement campaigns. [15] 

Covid-19 has demonstrated the value of timely data sharing, while highlighting flaws in UK data access systems that prevent agile and responsive research. Although these concerns have been communicated to the government previously, it was not until covid-19 that the potential impact was realised and actions taken. [29,31] However, the implications are no less critical for other public health topics. The potential risks involved in the use of administrative data will always need to be carefully considered, but covid-19 has shown that increased capacity and political will can successfully simplify approval processes, reduce delays and enable more efficient data access whilst respecting data protection principles. Building on the substantial interest in health data—and appreciation of its complexities—arising from the pandemic, we urge the government and data providers to learn the lessons of covid-19, and to work with the research community to build data access systems that are timely, resilient and responsive to changing local, national, and international contexts. Data providers need to fulfil their social licence with the public to use administrative data from the public, to benefit the public. [32,33] Covid-19 shows this can be done—it needs to continue.

Francesca Cavallaro is a research fellow at the Institute of Child Health, University College London

Fiona Lugg-Widger is a research fellow in routine data at the Centre for Trials Research, Cardiff University

Rebecca Cannings-John is a senior research fellow in statistics at the Centre for Trials Research, Cardiff University

Katie Harron is associate professor in statistics at the Institute of Child Health, University College London

Competing interests: None declared

on behalf of the signatories (see comment below for the full list of 374 signatories)

This piece was first published as an open letter was sent to the UK Information Commissioner, Chief Medical Officers of the UK, and UK data providers, and signed by 374 signatories. 


  1. Ferguson NM, Laydon D, Nedjati-Gilani G, et al. Impact of non-pharmaceutical interventions (NPIs) to   reduce   COVID-19   mortality   and   healthcare   demand, 16-03-2020.
  2. Lai A, Pasea L, Banerjee A, et al. Estimating excess mortality in people with cancer and multimorbidity in the COVID-19 emergency, 2020.
  3. LSHTM Centre for Mathematical Modelling of Infectious Diseases. Covid-19. 2020.
  4. Financial Times. Coronavirus tracked: the latest figures as countries fight to contain the pandemic | Free to read. 2020.
  5. UKRI. Research questions for covid-19 [priority research questions for the UKRI COVID-19 funding call]. 2020.
  6. Kwon D. How swamped preprint servers are blocking bad coronavirus research. Nature 2020;581:130-31.
  7. Glasziou PP, Sanders S, Hoffmann T. Waste in covid-19 research. Bmj 2020;369:m1847.
  8. TranspariMED. All Covid-19 clinical trials at a glance. 2020.
  9. CPRD. Pricing.
  10. CPRD. CPRD linked data.
  11. Ford E, Boyd A, Bowles JKF, et al. Our data, our society, our health: A vision for inclusive and transparent health data science in the United Kingdom and beyond. Learning Health Systems 2019;3(3):e10191.
  12. Patient data –balancing access and protection. All talk and no access? Workshop summary. 2019.
  13. Dattani N, Hardelid P, Davey J, et al. Accessing electronic administrative health data for research takes time. Arch Dis Child 2013;98(5):391-2.
  14. Harper G. Linkage of Maternity Hospital Episode Statistics data to birth registration and notification records for births in England 2005–2014: Quality assurance of linkage of routine data for singleton and multiple births. BMJ Open 2018;8(3):e017898.
  15. Mourby M, Doidge J, Jones KH, et al. Health Data Linkage for UK Public Interest Research: Key Obstacles and Solutions. IJPDS 2019;4(1):09.
  16. Morris H, Lanati S, Gilbert R. Challenges of administrative data linkages: experiences of Administrative Data Research Centre for England (ADRC-E). IJPDS 2018;3(2).
  17. Downs JM, Ford T, Stewart R, et al. An approach to linking education, social care and electronic health records for children and young people in South London: a linkage study of child and adolescent mental health service data. BMJ Open 2019;9(1):e024355.
  18. Jones KH, Laurie G, Stevens L, et al. The other side of the coin: Harm due to the non-use of health-related data. International journal of medical informatics 2017;97:43-51.
  19. UK Government. Coronavirus (COVID-19): notification to organisations to share information 2020.
  20. Office for National Statistics. Accessing secure research data as an accredited researcher  – 1. COVID-19 update 2020.
  21. UK Biobank. UK Biobank makes health data available to tackle COVID-19. 2020.
  22. Swedish Research Council (VR). Evaluation of the MONA system (Microdata Online Access). 2014.
  23. Lugg-Widger F, Angel L, Cannings-John R, et al. Challenges in accessing routinely collected data from multiple providers in the UK for primary studies: Managing the morass. International Journal of Population Data Science 2018;3(3).
  24. Golan EH. Sustainability and migration: experiments from the Senegalese peanut basin. The Annals of regional science 1994;28(1):91-106.
  25. Buchan I. National Grid of Civic Data Cooperatives for Health in Health of the Nation, All Party Parliamentary Advisory Group on Longevity. 2020.
  26. OpenSAFELY. Home.
  27. Informati Commissioner’s Office. Guide to the General Data Protection Regulation (GDPR).
  28. Rodgers SE, Lyons RA, Dsilva R, et al. Residential Anonymous Linking Fields (RALFs): a novel information infrastructure to study the interaction between the environment and individuals’ health. Journal of Public Health 2009;31(4):582-88.
  29. Lugg-Widger F, Hood K, Robling M. Written evidence submitted by Centre for Trials Research, Cardiff University (DIG0006). 2018.
  30. Understanding Patient Data.
  31. Strategic Coordination of the Health of the Public Research (SCHOPR). Letter from the Independent Chair, on behalf of SCHOPR members, to the UK Chief Medical Officers; July 2019.
  32. Carter P, Laurie GT, Dixon-Woods M. The social licence for research: why ‘’ ran into trouble. Journal of Medical Ethics 2015;41(5):404-09.
  33. Paprica PA, de Melo MN, Schull MJ. Social licence and the general public’s attitudes toward research based on linked administrative health data: a qualitative study. CMAJ Open 2019;7(1):E40-E46.