Richard Smith: Improving the evaluation and regulation of medical devices

Earlier this month the Scottish government ordered an immediate halt to the use of vaginal mesh after mesh had contributed to a woman’s death in Edinburgh in August. The death followed years of accumulating evidence of the dangers of vaginal mesh.

Women undergoing hysteroscopic sterilisation with the “Essure” device have 10 times the risk of needing reoperation compared with women undergoing laparoscopic sterilisation, and last year the device was removed from the European market. These are just two examples among multiple failures that raise questions about how medical devices reach the market, and a meeting in Bristol earlier this month discussed how systems for allowing medical devices onto the market might be improved. In the same week the Secretary of State for Health said that GP at Hand, an app that uses artificial intelligence to assess symptoms and offers video and telephone consultations with GPs, should be available across England before the NHS has completed its evaluation of the device.

Regulatory approval of medical devices is generally much less onerous than approval of drugs. In the US devices judged as low risk can be approved on the grounds of being as safe and effective as existing devices, some of which may have been approved back before clinical evidence was required. In Europe some 50 “notified bodies” can approve devices on sometimes little more than a narrative review of published reports. Although the Medicines and Healthcare Products Regulatory Agency (MHRA) oversees the UK notifying bodies, the required evidence to let the device be marketed for clinical use is much less rigorous than in the drug industry. Randomised controlled trials are not required. The problem can be summarised by saying that manufacturers chose the methods of evaluation and the outcomes they will use, allowing lots of scope for potentially unsafe products to reach market.

Camilla Fleetcroft from the MHRA explained how the European Union is trying to tighten up approval systems, but Carl Heneghan, director of the Centre for Evidence Based Medicine who has studied processes for regulating medical devices, described the proposals as a hundred pages of smokescreen.

There is acceptance that the system used for drugs does not work for medical devices, where there is often continuing development of the device and it may well require surgery to install the device and so some learning by surgeons. There is even less regulation of surgical procedures themselves. But there are ideas on what a better system for evaluating devices could look like. Peter McCulloch, professor of surgical science in Oxford, described how the IDEAL (Idea, Development, Exploration, Assessment, Long term study) system for evaluating surgical innovations can be adapted for devices. IDEAL answers a series of questions: what is the it (the surgical innovation)?, stage 1; have we perfected it? stage 2a; can we agree on what it is and who should get it?, stage 2b; is it better than current practice? stage 3; and are there any surprises? Stage 4.

Medical devices would require, said McCulloch, a stage 0, which would be prehuman studies. Stage 2a would probably be redundant, and stage 2b only partially relevant. Stage 3 is important and would preferably be answered with randomised trials with specified and agreed outcomes or large scale studies. Everybody at the meeting agreed that stage 4 requires that for all implantable devices, like vaginal mesh, registers should be established with every patient being registered to allow long term surveillance. Preferably the registers should be independent of manufacturers, but they are expensive to maintain. Yet, as with vaginal mesh and other implanted devices, it may take many years for serious problems to emerge.

Many devices are produced by small companies, and, as Edward Draper, chief executive officer of Ortheia Ltd, explained, conducting trials is expensive and sometimes beyond the means of small companies. There has to be a trade-off among the needs of patients, regulators, companies, purchasing authorities, and health professionals. Getting the trade-off right may not be easy. Ted Lystig, director of corporate biostatistics for Medtronic (one of the largest medical device companies), predicted that in coming years “real world evidence” derived from administrative and claims data and electronic health records would become steadily more important, complementing not replacing evidence from clinical trials. Scott Gottlieb, the director of the Food and Drug Administration, has predicted that in a world of personalised medicine and patients with multiple conditions real world evidence may overtake clinical trials as the most important evidence in making decisions on new drugs and devices. This is scary to some when the underlying data are of questionable quality and the methods used to try and minimise bias are complex and unfamiliar.

The main thrust of the meeting, which was funded by the MRC Hubs for Trials Methodology Network, was to try and encourage the use of generic core outcome sets in making decisions on the regulation of devices. Kerry Avery, a senior lecturer on health services research in Bristol, reported how a review of 42 studies of leadless pacemakers (pacemakers that are implanted directly into the patient’s heart, avoiding the need for leads between the pacemaker and the heart, which are prone to infection) found some 2500 different individual outcomes reported. Heneghan told the meeting how the COMPARE study of 67 randomised trials in the top five medical journals found 354 outcomes specified in protocols but not reported and 357 that were not specified but were reported. This switching of outcomes opens up the possibility of very misleading results.

The COMET (Core Outcome Measures in Effectiveness Trials) Initiative aims to encourage the development and uptake of core outcome sets. Core outcome sets are a scientifically agreed minimum data set that are selected by relevant stakeholders to use in all trials of specific conditions. Paula Williamson, professor of medical statistics at the MRC north West Hub for Trials Methodological Research, described how agreeing on core outcome sets increases consistency across trials, makes systematic reviews more meaningful, increases the likelihood that important outcomes are measured, and reduces selective reporting.

Who should be part of agreeing outcomes? At the moment it’s health professionals, patients, and those who do the research, but users of the research, particularly regulators, should probably be included. As trials have the ultimate aim of improving patient care patients, argued Williamson, are the most important voice. Yet trials, she reported, measure about only half of the outcomes that matter to patients. Currently, core outcomes sets are designed for Phase 3 randomised controlled trials and not used in early phase studies or in the studies developed to being new devices onto the market. The role of using surrogate outcomes in the earlier phase studies was considered as that could link to the core outcome sets used in phase 3 trials and improve the understanding of the “life cycle” of a device.

All those at the conference discussed a study from Annals of Surgery, the world’s leading general surgical journal we were told, that evaluated the Magnetic Surgical System, in which surgeons performing a laparoscopic cholecystectomy place a metal clip on the gallbladder, and an external magnet then allows external retraction and mobilisation of the gallbladder. This avoids the need for an extra entry point for an assistant to retract the gallbladder, so potentially reducing “incisional pain, scarring, infection, and bowel and vascular injuries.”

The study reported a “clinical trial,” but it was actually a series of 50 patients, most of whom were young and relatively thin and did not have scarred gallbladders. Outcomes included things like “device ease of use” (“yes” in all 50 operations) and “device malfunction” (“no” in all 50 operations), and some things that would matter to patients like “pain scores” and “external abdominal wall evaluation” six hours after the operation (“normal” in all 50).

Despite having no comparison group, not being big enough to measure “incisional pain, scarring, infection, and bowel and vascular injuries,” being conducted in a low risk group, and having results that felt “too good to be true,” the study is said to show that the device is “safe and effective.” The device is “already cleared for commercialalization by the Food and Drug Administration,” so there was a feeling at the meeting that the study was primarily intended to encourage uptake among surgeons, “who love new toys.”

The Bristol Biomedical Research Centre Surgical Innovation theme funded by the National Institute for Health Research is working to address the issues of selection, measurement, and reporting of outcomes for devices and surgical procedures, and let us hope that they can make the progress that is badly needed.

I came away from the meeting worried that evaluation and regulation of medical devices is inadequate, convinced that there are better ways than are currently used to evaluate and regulate medical devices and that core outcome sets with the lead being taken by patients would be a step forward, a supporter of COMET, and depressed at the standards of the world’s leading general surgical journal and the gullibility of surgeons.

Richard Smith was the editor of The BMJ until 2004.

Competing interest: RS spoke at the meeting, on the failures associated with current ways of publishing reports of surgical innovations, and paid his own expenses but was given a free lunch.

Information for Authors