Magdalena Zwierzyna: Lessons on trial design and transparency from

Over time, has accumulated a powerful dataset that offers a remarkable opportunity to study trends in the current clinical research landscape. The world’s largest primary clinical trials registry enables researchers and physicians to identify studies in their specialty and collect clinical evidence. It allows patients to search for trials that are actively recruiting participants. It is even used by the FDAAA trials tracker to identify and challenge trial sponsors who do not comply with transparency legislation.

The increasingly data driven approaches to research and development in the drug discovery industry also benefit from the resource. At BenevolentAI, a British drug discovery company, we used to ensure that the name of our first trial was not similar to the acronym of any ongoing or past study. More importantly, the data help our drug discovery researchers identify unmet needs, interesting assets, and potential collaborators. Available summary protocol data for hundreds of past studies are also extremely useful when planning study design. How big are trials in a given disease? How long do they typically take? How many centres are involved in recruitment? What are commonly used outcome measures, eligibility criteria, and comparators? Modern text mining and data analytics approaches help us answer such questions, and learn from past clinical trials to improve the design of studies in the future.

It was when attempting to tackle some of these typical study design questions that we became interested in the apparent heterogeneity of registered trials.
 Even after accounting for obvious outliers (such as interventional trials with reported recruitment exceeding 1 000 000 participants), it turns out that even within the same disease category and study phase, the differences in sample size or trial duration can be huge.
 In other words, the so-called typical trial simply does not exist.

We took up the challenge to further investigate the differences between trials systematically and analysed a dataset covering all completed interventional studies registered with since 2005. We also linked the registry records to a database of published scientific literature to investigate associations between trial design and publication in medical journals, helping to address one of the most important questions in medical research: the transparency and reporting of clinical studies.

Our research paper (BMJ 2018;361:k2130) found that trial phase, size, design, and even medical specialty are all associated with the likelihood of trial publication in medical journals. For instance, oncology had the lowest rate of results publication across all disease areas, with only a fifth of all cancer studies published in a journal. Is this low rate because of the high prevalence of small sample sizes and non-randomised designs in this field? Or perhaps because of the fact that so many cancer trials fail to show positive outcomes?
 One thing is for certain: the evidence available in scientific literature is largely incomplete, and could be distorted as well.

Here’s one encouraging finding: the results database of seems much less biased than the medical literature. Not only does it store the results of many small non-randomised studies that are not published elsewhere, but the time to results dissemination is also faster and the differences across medical specialties are much less pronounced. Not all organisations, however, are equally likely to submit their results to the portal. Trial reporting rates are highest for big pharma and the National Institutes of Health, followed by smaller industrial organisations, but universities and other non-profit funders still lag behind. As more academic institutions enter the clinical trial arena, it is important to better understand how this sector compares to industry, and our analysis identified several differences in both trial design and results dissemination. Whether we explain these trends by differences in regulations, objectives, available resources, or awareness of the FDAAA legislation, the situation needs to be addressed.

The struggle for transparency must continue. With over 270 000 registered studies and almost 30 000 result summaries, already has an important role in collecting evidence from clinical research. Still, we are a long way from where we need to be. To put things into scale, the entire content of the clinical trial registry can be downloaded in a single file, just over 1 gigabyte in size. There are a lot more clinical data out there that should be made available. We hope that the findings from our study and from others will help improve our understanding of the complexities of trial dissemination, and—ultimately—lead to better access to data from all clinical trials.

Magdalena Zwierzyna is a biomedical data scientist at BenevolentAI, a British artificial intelligence drug discovery company. She is also pursuing a PhD in bioinformatics at University College London. You can follow her on Twitter @magda_zw

Competing interests: None declared.