Richard Smith: How might artificial intelligence improve healthcare?

richard_smith_2014Artificial intelligence, which few of us understand, might apocalyptically enslave humanity or release it from death. Some prominent scientists believe that robots blessed with artificial intelligence will soon be more intelligent than humans and conclude that they have little use for us. Other transhumanists think that it will be possible to “download” human minds into computers, so gifting us immortality. But in a house in a village outside Tel Aviv with a garden filled with orange trees, I encountered a more plausible and much gentler story of how artificial intelligence together with machine learning can improve healthcare and so help humanity.

Early days of development

Medial Early Sign, for which I occasionally consult, is an Israeli start-up that is using artificial intelligence and machine learning to improve healthcare. The company is funded by an individual who has made a fortune from using artificial intelligence and machine learning to predict movements in stocks and shares. The company started in 2009 with the funding and simply the concept that artificial intelligence and machine learning could be used to improve healthcare.

In the early days the company tested the concept by working with doctors in the intensive care units in two Tel Aviv based hospitals. The company used data, artificial intelligence, and machine learning to predict which patients would die and which would develop kidney failure much more accurately than the doctors. But these experiments were simply for proof of concept; they were not of business interest. It allowed the company to better understand the complexity of medical data and adjust its algorithmic tools and methodologies.

The four stages of developing algomarkers, predictors of health events

At the heart of the enterprise is developing algomarkers that will predict the risk of future health events from routinely collected data—for example, the development of colon cancer or the onset of complications in patients with diabetes.

The four stage process of developing algomarkers begins with establishing a pipeline of possible algomarkers that would be clinically useful, actionable, capable of being produced from routinely available data, and lead to a return on investment, meaning that health authorities would be willing to pay for the algomarker and that it would eventually lead to profits for the company. This pipeline is developed mainly from continuing conversations with clinicians. Possibilities that seem clinically exciting might have to be discarded at this stage because they don’t meet the criteria.

The next stage is research. Data scientists use artificial intelligence and machine learning to search for signals that would allow prediction of the event from a large dataset. Medial Early Sign has data from electronic medical records on some 20 million people, giving 100 million-person years. The larger the dataset the more likely that it will be possible to find useful signals, and this applies not only to the number of people and years but also the number of different datapoints. Age, sex, and laboratory results are the data that are most easily identified and used, but the company is already exploring other sources of data, including natural language processing of doctors’ summaries of patients’ diagnoses. There is clearly a trade-off between having more data points, making it more likely that a useful signal can be detected, and needing to use data that are simply not available in many routine datasets.

Algomarkers predicting complications of hypertension and the onset of lung cancer have made it through this stage, but there is a point where potential algomarkers may be abandoned. For example, an attempt to predict who would need computed tomography of the head in the emergency room, specifically among children, failed to find a significant signal and was therefore abandoned. The research phase is fast, and it will usually be apparent within two months that a signal cannot be detected. Sometimes, a newly developed algomarker is being given a lower priority until additional data are available. Such was the case when the company developed an algomarker that would predict which men with prostate cancer would progress to lethal disease. It might be (in fact, probably will be) that in the future it will be possible to identify a signal with bigger datasets and more datapoints.

The third stage is to turn the algomarker into a product that can be sold and used. This is happening with algomarkers predicting which patients will progress from prediabetes to diabetes; which patients with diabetes will develop nephropathy or a heart attack or heart failure; and which patients would be most likely to develop serious complications from flu, making them the highest priority for vaccination.

The final stage is to arrange for these algomarkers to be tested by independent researchers in different populations. The first algomarker developed by the company, ColonFlag, uses simply a full blood count, age, and sex to measure a person’s risk of developing colon cancer. The hope is that artificial intelligence and machine learning can detect changes that will be meaningful before a clinician can do so. ColonFlag has been tested in populations in Israel, California, China, and the UK, and the results are essentially the same with each population.

Experience in the UK

An Oxford group has tested ColonFlag retrospectively using medical records from some 2.5 million patients in a study funded by the National Institute for Health ResearchColonFlag was developed using an Israeli dataset. One dataset was used to “train” ColonFlag, and then the predictive value was tested on a second dataset. ColonFlag was thereafter validated on the UK population, where the company provided it to the Oxford research group, for each of the patients in the study without knowing which patients went on to develop colon cancer.

The study identified 5141 patients who had a diagnosis of colon cancer and at least one full blood count and related algomarker 18-24 months before diagnosis; just over 2.2 million patients had a full blood count in the same period and no diagnosis of colon cancer. The primary endpoint was the area under the receiver operating curve, but the most easily understood result was to show that patients with a risk score associated with 99.5% specificity had an 8.8% chance of being diagnosed with colon cancer within the next two years. Unsurprisingly, the test has greater predictive value as it comes closer to the time of diagnosis. This result is comparable with results achieved with a clinically based predictive score developed in Britain.

Using algomarkers clinically

ColonFlag is used routinely in Maccabee Healthcare, a health maintenance organisation covering two million people, where 80% of patients have an annual full blood count and patients over 50 are advised to have a screening test for colorectal cancer. Nearly 70% respond to an annual reminder to do a faecal occult blood test or a colonoscopy once every 10 years. Among those who fail to adhere to this screening programme, ColonFlag is used to identify patients at risk of currently harbouring colorectal cancer and expedite their referral to colonoscopy. Maccabee Healthcare calculates that using the signal has both reduced deaths and saved it money through avoiding expensive, late stage treatments. Maccabee Healthcare has also found that “false positives” also often have another condition—perhaps an upper gastrointestinal cancer or an adenoma and other cancers .

In Britain patients over 60 are offered a faecal occult blood test every two years, and a meta-analysis has shown a 16% reduction in deaths from colon cancer as a result of such programmes. But only about three out of five people take the test, and so there is scope for another system of prediction. The capacity to identify people at 8.8% risk of cancer in the next two years is higher than 3% risk at which the National Institute of Health and Care Excellence (NICE) recommends fast track referral for investigation of cancer.

Working out a possible role for ColonFlag in Britain is not straightforward, and the Oxford study has several problems, as the authors concede. All general practices in Britain have electronic medical records, and it could be that clinicians would appreciate a signal that could be delivered routinely and could help decide on the best path forward for a patient. But many patients will not have regular full blood counts in a two year period, and the average general practitioner encounters a case of colon cancer rarely—so it’s not a condition that is a high priority for GPs: they may think that screening for colon cancer is a national not a local priority.

Multiple algomarkers

The value of algomarkers is likely to be cumulative. It will come when perhaps 20 or eventually a thousand signals can be extracted from routine medical data. Medial Early Sign is building a system called AlgoAnalyser that can host any number of algomarkers, as well as many other currently used risk scores. The AlgoAnalyser extracts data from electronic medical records, integrates them with data from other sources, and produces scores for a  range of algomarkers. Depending on local epidemiology and health systems, algomarkers may vary in their value: for example, in Britain, algomarkers that can predict better than existing systems which patients with prediabetes are at highest risk of developing diabetes and so can allow them to be entered preferentially into prevention programmes with limited capacity. This may be more valuable than an algomarker for colon cancer.

Biggest benefits in low income countries?

The biggest benefits from artificial intelligence and machine learning may eventually come not in high income countries but in low and middle income countries where doctors are in short supply. Nurses and community health workers aided by artificial intelligence and machine learning may be able to provide services as effective as those provided by doctors. Many innovations in healthcare—for example, genomics—seem to offer the unwelcome possibility of dividing even further the already wide gap between those getting the best services and those getting little or nothing. Artificial intelligence and machine learning, in contrast, have the potential to narrow the gap.

A new role for laboratories

The company has identified laboratories as a particular place where algomarkers may be useful. As healthcare funders begin to pay for outcomes rather than activity, laboratories move from being revenue centres charging a fee for a test to being a cost centre. This puts them under much greater budgetary pressure, and one response is to move from simply reporting results to reporting proposed actions. This is where artificial intelligence and machine learning could be useful: they could use the results of laboratory tests plus other data to recommend actions. Consider, for example, a patient presenting in an emergency room with chest pain. The laboratory might report not just troponin levels and the results of laboratory and other tests, but also a recommendation on whether to admit patients or discharge them. Whether it will be possible to make recommendations with great confidence and whether the clinicians caring directly for the patients will accept the recommendations remain open questions.

The future

One vision for the future is that artificial intelligence and machine learning might be used for individual patients. Modern healthcare is concerned mostly with patients with multiple conditions, limiting the usefulness of guidelines developed usually from randomised trials that have excluded patients with multiple conditions. With a large database it should be possible to identify many patients with a similar pattern of disease to any particular patient and then use the data from those patients and their outcomes to identify the best path forward for the particular patient.

The vision of many for the future of healthcare is that it moves from sickness care to the promotion and maintenance of health, and Medial Early Sign is responding to this challenge. One way to do this might be to divide health into a few domains (cardiological, respiratory, gastrointestinal, musculoskeletal, neurological, metabolic, etc) and use artificial intelligence and machine learning to give every individual a score—probably in age so that individuals could, for example, relate their heart age to their actual age. Such a score, which is already used by some organisations, should be easy to understand and motivational.

Medial Early Sign is a start-up, albeit one with a secure financial base, with all the implications of uncertainty of both products and business models. But it seems highly likely that it will—together with other organisations and companies like Deep Mind using artificial intelligence and machine learning—bring about considerable improvements in healthcare, avoiding the apocalyptic visions imagined by some.

Richard Smith was the editor of The BMJ until 2004.

Competing interest: RS is a paid consultant to Medial Early Sign and had his expenses paid to travel to Israel. The company also paid for him and his wife to spend time in Tel Aviv and Jerusalem and for two guided tours.