The consistency of machine learning and statistical models in predicting clinical risks of individual patients

“…Now, imagine a machine learning system with an understanding of every detail of that person’s entire clinical history and the trajectory of their disease…. With the clinician’s push of a button, such a system would be able to provide patient-specific predictions of expected outcomes if no treatment is provided… to support the clinician and patient in making what may be life-or-death decisions…” [1] This would be a major achievement. The English NHS is currently investing £250 million in Artificial Intelligence (AI). Part of this AI work could help to “identify patients most at risk of diseases such as heart disease or dementia, allowing for earlier diagnosis and cheaper, more focused, personalised prevention.” [2] Multiple papers have suggested that machine learning outperforms statistical models including cardiovascular disease risk prediction. [3-6] We tested whether it is true with prediction of cardiovascular disease as exemplar.

Risk prediction models have been implemented worldwide into clinical practice to help clinicians make treatment decisions. As an example, guidelines by the UK National Institute for Health and Care Excellence recommend that statins are considered for patients with a predicted 10-year cardiovascular disease risk of 10% or more. [7] This is based on the estimation of QRISK which was derived using a statistical model. [8] Our research evaluated whether the predictions of cardiovascular disease risk for an individual patient would be similar if another model, such as a machine learning models were used, as different predictions could lead to different treatment decisions for a patient.

An electronic health record dataset was used for this study with similar risk factor information used across all models. Nineteen different prediction techniques were applied including 12 families of machine learning models (such as neural networks) and seven statistical models (such as Cox proportional hazards models). It was found that the various models had similar population-level model performance (C-statistics of about 0.87 and similar calibration). However, the predictions for individual CVD risks varied widely between and within different types of machine learning and statistical models, especially in patients with higher CVD risks. Most of the machine learning models, tested in this study, do not take censoring into account by default (i.e., loss to follow-up over the 10 years). This resulted in these models substantially underestimating cardiovascular disease risk.

The level of consistency within and between models should be assessed before they are used for treatment decisions making, as an arbitrary choice of technique and model could lead to a different treatment decision.

So, can a push of a button provide patient-specific risk prediction estimates by machine learning? Yes, it can. But should we use such estimates for patient-specific treatment-decision making if these predictions are model-dependant? Machine learning may be helpful in some areas of healthcare – such as image recognition, and could be as useful as statistical models on population level prediction tasks. But in terms of predicting risk for individual decision making we think a lot more work could be done. Perhaps the claim that machine learning will revolutionise healthcare is a little premature.

Yan Li, doctoral student of statistical epidemiology, Health e-Research Centre, Health Data Research UK North, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester.

Matthew Sperrin, senior lecturer in health data science, Health e-Research Centre, Health Data Research UK North, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester.

Darren M Ashcroft, professor of pharmacoepidemiology, Centre for Pharmacoepidemiology and Drug Safety, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester.

Tjeerd Pieter van Staa, professor in health e-research, Health e-Research Centre, Health Data Research UK North, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester.

Competing interests: None declared.

References:

The Alan Turing Institute. Turing Lecture: Transforming medicine through AI-enabled healthcare pathways – YouTube. https://www.youtube.com/watch?v=TWI-WIoWvfk&feature=youtu.be&_cldee=dGplZXJkLnZhbnN0YWFAbWFuY2hlc3Rlci5hYy51aw%3D%3D&recipientid=contact-d2c6e6742b58e811812370106faae7f1-40027ba23fa146c189b8a3077e37916a&esid=2d3fc1d1-6593-e911-a98b-002248014cd6. Accessed September 24, 2019.
GOV.UK. Health Secretary announces £250 million investment in artificial intelligence – GOV.UK. https://www.gov.uk/government/news/health-secretary-announces-250-million-investment-in-artificial-intelligence. Accessed August 28, 2019.
Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? Liu B, ed. PLoS One. 2017;12(4):e0174944. doi:10.1371/journal.pone.0174944
Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. 2018. doi:10.1371/journal.pone.0202344
Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. Aalto-Setala K, ed. PLoS One. 2019;14(5):e0213653. doi:10.1371/journal.pone.0213653
Al’Aref SJ, Anchouche K, Singh G, et al. Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging. Eur Heart J. 2019;40(24):1975-1986. doi:10.1093/eurheartj/ehy404
National Clinical Guideline Centre Lipid Modification Cardiovascular Risk Assessment and the Modification of Blood Lipids for the Primary and Secondary Prevention of Cardiovascular Disease Clinical Guideline Methods, Evidence and Recommendations Lipid Modification Contents.; 2014. https://www.nice.org.uk/guidance/cg181. Accessed August 28, 2019.
Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. Bmj. 2017;2099(May):j2099. doi:10.1136/bmj.j2099

Information for Authors