Robert Kemp and Vinay Prasad: Should we accept higher p values than 0.05 for new cancer drugs?

The current regulatory system already tolerates a tremendous amount of uncertainty regarding cancer drugs in the real world and does not need more

Cancer patients deserve, and many prefer, anti-cancer drugs with proven benefits on meaningful endpoints such as overall survival, quality of life, or both. However, some patients with terminal conditions lacking good options may wish to tolerate more risk, and expose themselves to therapies for which those benefits on survival or quality of life are still unproven. Such sentiment has been fuelling enthusiasm for so-called right-to-try laws, and a recent health economics paper, which is making the case that the regulatory standards for approving drugs are too stringent. Montazerhodjat et al. take issue with the p-value, arguing that regulators’ insistence on a significance level of p<0.05 is often too conservative. [1] In the case of metastatic pancreatic cancer, for instance, the authors advocate lowering the hurdle to p<0.20. Not only is this a bad idea, because it increases the chances of “false positives” flooding the market, but it misses the point that the current system, even with a more stringent p-value cutoff of p<0.05, is already frequently rigged in favour of sponsors in a way that ultimately leaves patients with approved drugs for which there is still substantial uncertainty of actual clinical benefit. Here, we highlight five reasons why an approved drug will not work—i.e. improve meaningful endpoints—when used as intended in the real world.  

Firstly, consider one third of cancer drugs are approved on the basis of uncontrolled, single arm studies.  Allowing approvals without randomized trials introduces two types of uncertainty. The response rate, a measure of tumor shrinkage, is notoriously inflated in uncontrolled trials than subsequent randomized studies. [3] Thus, the true effect of these agents is exaggerated. Also, in the absence of controlled trials, the oncology profession remains unsure whether new drugs are superior to older standards or, in some cases, best supportive care. Belinostat was approved for relapsed peripheral T cell lymphoma on the basis of response rate in a single arm study. While this agent has activity, we do not know if it is superior, equivalent, or inferior to its predecessors: pralatrexate or romidepsin with respect to meaningful endpoints.

Secondly, what is the effect size we are chasing? Over time, trials in oncology have increased their sample sizes, permitting the detection of smaller effects. These gains may be statistically significant, but clinically meaningless, and some may even be lost when applied to heterogeneous, unselected patients in the real world. For instance, the marginal benefit of sorafenib in hepatocellular carcinoma was lost in a real world Medicare analysis, where patients were older and frailer. [4] In fact, an examination of 71 consecutive approvals in solid tumors over the last decade shows predominantly marginal effect sizes with a median improvement in progression free survival and overall survival of 2.3 and 2.1 months, respectively. [5] Davis and colleagues recently confirmed this finding in The BMJ, showing that many cancer drugs fail to meet the European Society of Medical Oncology’s standard of meaningful benefit. Small effects mean less reserve to tolerate dilution of the effect before the benefit is lost in the real world.

Thirdly, when randomized trials are conducted in oncology, hardwired bias is prevalent. Hardwired refers to bias introduced through design choices. Bias can take the form of inclusion criteria, implicit or explicit, which results in unrepresentative patient populations.  Pivotal trials continue to utilize patients who are younger and with fewer comorbidities than patients in the real world. Straw man comparisons test novel agents against drugs that are already known to be inferior to alternatives.  For instance, a comparison of ibrutinib to chlorambucil led to US FDA approval for ibrutinib as frontline therapy in chronic lymphocytic leukemia, but used a control arm, which had been beaten by other agents and, in real world clinical practice, was seldom used. As clinical trials look less like real world populations, or novel drugs tested against drugs that have already been replaced by superior alternatives, we tolerate higher uncertainty that these agents will improve outcomes when applied in clinical practice.

Fourth, the use of validated and un-validated surrogate endpoint inflates uncertainty.  Two thirds of cancer drugs are approved on the basis of a surrogate—typically progression free survival or response rate.  At the time of approval, 45% of surrogates used by the FDA have no study documenting their correlation with survival. [2] When we do know the correlation between surrogates and survival, they are often poor, and leave considerable uncertainty as to whether these drugs improve survival or quality of life. [2,6]

Fifth, excessive reliance on subgroup analyses can lead to spurious conclusions. As cancer becomes a set of rare diseases, we increasingly draw conclusions about the efficacy of treatments that appear to work only for some patients. Pemetrexed (Alimta, Eli Lilly) is widely believed to be superior in the non-squamous histology of non small cell lung cancer, and yet, this conclusion is based upon a subgroup analysis not adjusted for multiple hypotheses tested, and has not been tested in a prospective, independent analysis. [7]

At last, we come to the “alpha level” of a study or the Type I error or “false positives”. Ironically, the recent call to loosen restrictions on the p-value cutoff that denotes statistical significant is the only source of uncertainty that has remained constant. While a p-value theoretically measures the uncertainty that drugs are beneficial, this only holds true only for perfectly powered, unbiased studies. A widely popular paper by John Ioannidis, entitled Why Most Published Research is False, uses mathematical modelling to show that the factors we have described here contribute far more to the uncertainty of a specific, scientific result than the alpha level. [8]

We understand the desire for patients, with life threatening cancers and few therapeutic options, to attempt drugs that only possibly provide meaningful benefits. However, recent calls to lower the threshold of statistical significance will further inflate the probability that a cancer patient will be advised a treatment that does not work. [1] That probability is already quite high, and driven by many, disparate factors.

As a matter of fact, the current regulatory system tolerates a high degree of uncertainty that our interventions benefit the patients in whom we employ them. We favour efforts to understand and quantify this uncertainty, but for the time being, are reluctant to embrace proposals that enhance it. If anything, we believe the uncertainty for patients should be reduced, and the use of surrogates as well as uncontrolled studies should remain the exception to cancer drug approval and not the rule. Cancer patients vary in their willingness to tolerate risk, and the current approval structure already makes many costly drugs available whose benefit is anything but certain. Loosening standards is the wrong direction for reform.

Robert Kemp, Academic FY1 Doctor, Southampton General Hospital, University of Southampton. 

 

 

 

Vinay Prasad, Assistant Professor of Medicine,
Division of Hematology Oncology in the Knight Cancer Institute
Department of Public Health and Preventive Medicine
Senior Scholar in the Center for Health Care Ethics
Oregon Health and Sciences University, Portland, Oregon.

 

Competing Interests: Vinay Prasad is funded by the Laura and John Arnold Foundation.  

See also:

Research: Availability of evidence of benefits on overall survival and quality of life of cancer drugs approved by European Medicines Agency
Feature: Cancer drugs: high price, uncertain value
Editorial: Do cancer drugs improve survival or quality of life?
Patient commentary: The current model has failed

References:

  1. Montazerhodjat V, Chaudhuri SE, Sargent DJ, et al. Use of bayesian decision analysis to minimize harm in patient-centered randomized clinical trials in oncology. JAMA Oncology 2017.
  2. Kim C, Prasad V. Strength of Validation for Surrogate End Points Used in the US Food and Drug Administration’s Approval of Oncology Drugs. Mayo Clinic Proceedings 2016;91(6):713-25.
  3. Zia MI, Siu LL, Pond GR, et al. Comparison of Outcomes of Phase II Studies and Subsequent Randomized Control Studies Using Identical Chemotherapeutic Regimens. Journal of Clinical Oncology 2005;23(28):6982-91.
  4. Sanoff HK, Chang Y, Lund JL, et al. Sorafenib Effectiveness in Advanced Hepatocellular Carcinoma. Oncologist 2016;21(9):1113-20.
  5. Fojo T, Mailankody S, Lo A. Unintended consequences of expensive cancer therapeutics-the pursuit of marginal indications and a me-too mentality that stifles innovation and creativity: the John Conley Lecture. JAMA Otolaryngol Head Neck Surg 2014;140(12):1225-36.
  6. Kim C, Prasad V. Strength of Validation for Surrogate End Points Used in the US Food and Drug Administration’s Approval of Oncology Drugs. Mayo Clin Proc 2016.
  7. Gyawali B, Prasad V. The billion dollar subgroup analysis. JAMA Oncology 2017;Forthcoming.
  8. Ioannidis JP. Why most published research findings are false: author’s reply to Goodman and Greenland. PLoS Med 2007;4(6):e215.

 

(Visited 1,893 times, 3 visits today)
  • Given that P values close to 0.05 already imply a false positive risk of *at least* 26% it would be foolish to interpret such P values as meaning more than ‘worth another look’. Put another way, if you observe a P close to 0.05, you’d have to be 87% sure that there as a real effect before the experiment was done in order to achieve a 5% risk of the result being a false positive. With P = 0.2, you’d have to be 98% certain.

    These numbers can be calculated easily with out web calculator, at http://fpr-calc.ucl.ac.uk/ The assumptions that underlie the calculations are given in detail in https://www.biorxiv.org/content/early/2017/10/25/144337