James Raftery: A more fundamental review of QALYs is needed

Quality Adjusted Life Years (QALYs) have become common in research, partly due to being used by National Institute for Health and Care Excellence (NICE). A long running effort to update the methods used to estimate QALYs has led to different results. NICE has indicated that it is not prepared, at least for now, to use the updated version, but recommends further research. The issue seems to be due to problems of consistency with previous judgements. A closer look indicates some of the problems with QALYs as well as alternatives that have been developed.

First some basics: Quality Adjusted Life Years as the name implies are based on years spent in particular health states of different quality (“utility” in economics jargon). Estimation of QALYs is based on two elements: the allocation of each patient to a health state and a value for each. Critically the value for each health state is not that of the patient, but rather, that of society. How best to elicit the values of society is complex with different methods contending. Trade-offs apply between precision and simplicity. Asking members to express preferences between different health states, each with varying duration, is cognitively demanding. Especially when consistency is required (or applied retrospectively).

As QALYs aim to apply across diseases, the health dimensions must be generic. Euroqol is one of several instruments which can be used to estimate QALYs. The mainstay tool has been Euroqol 5 Dimensions (EQ5D). The five dimensions are anxiety/depression, mobility, self care, usual activities, and pain/discomfort. EQ-5D-3L was introduced in 1990, with the 3L referring to 3 levels for each dimension: no/some/extreme problems. [1]

The update, EQ-5D-5L, only changed the levels, increasing the number from 3 to 5 (no/slight/moderate/severe/extreme problems). Again, these new states needed population values to be attached to each state. This survey work was done for the UK in 2017. The results indicated considerable changes with an increase in the average level of population health, leaving less scope for improvements in quality of life due to interventions. This could be linked to fewer people ranking particular health states as worse than death: over 30% versus 5%. [2]

While five dimensions and 3 levels has been criticised as crude, it yields 243 health states (3exp5). The extension to 5 levels increased the number of health states to 3,125 (5exp5). Not all of these can be valued in surveys. Instead, a selection are valued and the rest interpolated.

The 1993 population study had people value 43 (18%) of the health states with interpolation of the rest. The updated population study had people value 85 (3%) of the health states.

Concerns with the update were complicated by use of additional methods: lead time was introduced to adjust for worse than dead states, and a parallel discrete choice element was included. The results proved difficult to interpret with spikes at suspicious values (+/-1, 0.5, and 0). The different results could be due to changes in the description of the health states and/or to the methods used to value them. As both changed, disentangling them is difficult.

If QALYs based on EQ5D were purely for academic purposes then the changes might not matter. However the implications for NICE were huge. Use of the new system would lead to a discontinuity with past appraisals. It would also change any NICE cost per QALY thresholds. The reliance of NICE processes on precedent would be challenged, perhaps in NICE appeals or in the courts. In August 2017, NICE took an interim position in favour of EQ-5D-3L.

Submissions to NICE based on EQ-5D-5L should be converted to EQ-5D-3L. Ongoing clinical studies however should continue to use EQ-5D-5L.

NICE, stating that further research was needed, commissioned a review of the impact of the change as well as collection of new data. An independent quality assurance of the methods used for EQ-5D-5L must also be carried out. NICE is to review its position in August 2018

So what are we to make of this?

Firstly, with hindsight, it was always likely that updating the system would raise problems for NICE unless the values were similar. Disentangling the methodological issues will be difficult. The mix of the methods used for the English data set, unlike any of the other countries, poses challenges. As does interpretation of the results with spikes, which could be artefactual, or not.

Secondly, while minimal change may have been promised by sticking to the same five dimensions, in retrospect reviewing the dimensions may have been more important. Surveys have indicated that the most relevant other aspects of health include sight, hearing, and mental health. Any attempt to include these in the future will cause similar problems of consistency with past judgements.

A more fundamental review may also have been in order. QALYs have been subject to various critiques the most substantial of which was by Daniel Kahnemann, 2011 Nobel laureate. These included: assuming that people’s responses to questionnaires obey the axioms of utility theory, assuming that utility theory is correct, and that people’s values depend on the state they are in (no adaptation). Kahnemann pointed to the danger of implicit collusion between the researchers and decision makers, that the researchers will provide a number that is simple but wrong. His alternative approach would be based on citizens juries.

Amartya Sen, another Nobel laureate, has argued for a different alternative to QALYs, based on the capability approach. This has been developed by the ICEpop CAPability measure for Adults (ICECAP) which uses five attributes: “Attachment” (an ability to have love, friendship and support), “Stability” (an ability to feel settled and secure), “Achievement” (an ability to achieve and progress in life), “Enjoyment” (an ability to experience enjoyment and pleasure), and “Autonomy” (an ability to be independent). As with QALYs, population surveys are used to weight these very different attributes into a single measure.  

In a comparison of ICECAP with 6 QALY health utility tools, EQ5D (either version) explained less than half of the variation in capability wellbeing. ICECAP seems to be measuring something  different. How best to define, measure, and value health utility seems an open, but key question.

Overall, after twenty years of use, QALYs have been subject to substantial criticism, not least by two Nobel laureates. It will be unfortunate if NICE’s review is confined to reconciling the technical differences between the old and newer versions of EQ5D.

References:

1] Devlin, N., Shah, K., Feng, Y., Mulhern, B. and van Hout, B., 2017. Valuing health-related quality of life: An EQ-5D-5L value set for England. Health Economics. DOI: 10.1002/hec.3564

2] Nancy Devlin, John Brazier, A. Simon Pickard, Elly Stolk 3L, 5L, What the L? A NICE Conundrum Pharmacoeconomics First Online: 26 February 2018