You don't need to be signed in to read BMJ Blogs, but you can register here to receive updates about other BMJ products and services via our site.


Primary Care Corner with Geoffrey Modest MD: Steroid knee injections: do they help??

23 May, 17 | by gmodest

by Dr Geoffrey Modest

A recent article in JAMA found that regular injections of intra-articular steroids was associated with decreased knee cartilage volume and no real improvement in pain in patients with knee osteoarthritis (see doi:10.1001/jama.2017.5283).


–140 patients with symptomatic knee osteoarthritis as well as synovitis by ultrasound (evidence of effusion synovitis, with suprapatellar pouch depth >2mm) were randomized to receiving intra-articular 1cc triamcinolone 40mg vs 1cc saline every 3 months for 2 years, both without local anesthetic

–mean age 58, 54% women, BMI 31, 65% white, mean hemoglobin A1c=6%, CRP 0.5

–all patients had radiographic evidence of Kellgren-Lawrence knee OA grade 2 or 3 (grade 2= definite osteophytes and possible joint space narrowing on anteroposterior weight-bearing radiograph; grade 3= multiple osteophytes, definite joint space narrowing, sclerosis, possible bony deformity)

–knee MRI was done at baseline and then annually


–there was greater cartilage loss with injected steroids (volume loss of 0.21 mm vs 0.10 mm with normal saline), though the amount of superficial fibrillations (fraying of the articular surface) was more common in the saline group (34% vs 13%)

–no significant difference between the groups in pain scores, or functional activities such as the 20 meter-walk time or the chair-to-stand time (these were all measured after asking patient to not take pain meds for 2 days prior to their evaluations)

–adverse events: overall more significant in saline group (63 vs 52, p=0.02), though no difference in what was considered treatment-related.  Cellulitis in one patient in the saline group, also hemoglobin A1c actually decreased significantly in the steroid group (-0.1% vs increase of 0.2% in the saline group, and this was controlling for BMI, radiographic DJD classification, sex). No difference in hypertension


–As noted in a recent blog on the lack of benefit of arthroscopy in patients with degenerative knee disease (see here​ ), knee DJD is remarkably common and a leading cause of  disability (and medical costs, largely for procedures)

–the physiologic rationale for intra-articular steroid injections is that DJD is typically associated with synovitis, with its associated elaboration of biochemical mediators having the potential for causing further joint destruction (collagenases, aggrecanases, cytokines). And local steroids might decrease the inflammation and this destructive cycle. Animal studies have supported this hypothesis. This study, utilizing MRI to assess the steroid effects on cartilage, seems better designed than prior studies which have used xrays, given how insensitive xrays are to assessing the radiolucent cartilage.

–so, how can one reconcile the conclusions of this study (negative impact on cartilage and no effect on pain) with the other studies finding pain improvement in the 4 weeks after the injection, an older but smaller study of 68 patients with the same basic protocol as in this study finding some benefit for pain, and with the huge anecdotal experience of benefit (steroid injections are done increasingly commonly)???

–these patients had pretty mild DJD, especially in terms of baseline symptoms, with a WOMAC pain score of 8.3. This score is based on 5 items (pain during walking, using stairs, in bed, sitting or lying, and standing upright, each with a score of 0-4, ranging from  None (0), Mild (1), Moderate (2), Severe (3), and Extreme (4); ie total maximum score of 20, with an average score of 8, as in the above study, being between mild and moderate.

–one issue with knee OA is how to define it or its progression objectively. The Framingham Study found a poor correlation between radiographic knee OA and symptoms (see Hannah MT. J Rheumatol 2000; 27: 1513, for example). And there is no accepted minimal clinically important difference for MRI cartilage measurement, as duly noted by the authors of this study. Also, I am concerned about the increase found in cartilage fibrillation found in the non-steroid group, since this early splitting of the tangential cartilage surface might harken more severe and clinically important cartilage changes over the somewhat longer term

–they only assessed pain relief at the 3-monthly evaluation, with no data on how patients fared in the first 1 or 2 months [and, in my pretty extensive experience with knee injections, probably around 1000 over the years, the vast majority of patients getting relief for the first few months, some much longer, and that relief translates into dramatic improvement in function and pain relief; ie they can walk and actually do things they couldn’t do before]

–there are even some literature (a meta-analysis of 38 studies) supporting saline injections as  helping with pain relief [ie, their control injections were not necessarily sham injections; saline itself may have some benefit. Which is an important difference. There seems to be a more profound placebo effect with injections than pills, so perhaps the real control for this injection study should be a needle in the joint with no meds injected???]

–and, in terms of generalizability of these results, it is important to stress that these patients had clinically mild knee OA at baseline, but still received injections every 3 months [not necessarily common clinical practice for those with mild-to-moderate symptoms], so their results might not apply to many patients who are actually getting knee injections for more severe, functionally limiting pain despite exercise/physical therapy/etc

–what about the decrease in cartilage thickness?? This is certainly concerning, though perhaps there are non-measured countervailing processes going on: are patients getting a lot of early pain relief [a good thing], but then using their knees more [walking, etc] which leads to more cartilage destruction through wear-and-tear???  And, though small, does the relative improvement in A1c in the steroid group reflect the patients’ ability to do more exercise?


So, how should this study affect clinical practice??

–my non-rigorously-tested finding, through loads of knee injections, is that 90+% of patients have much less pain and are able to function much better after injections, and I will continue to do injections

–that being said, injections should be accompanied with aggressive patient education around the importance of quadriceps strengthening exercises, which often help a lot [there were older studies suggesting this may not be true in patients with misaligned knees, perhaps from more severe DJD, where the patella does not track correctly and quad strengthening might exacerbate knee symptoms, but my sense is that this is relatively uncommon].  And some patients need a knee injection in order to do more exercise or physical therapy…

–other therapeutic options are sparse. Arthroscopic meniscal repair or joint lavage seems to do nothing . Physical therapy is important, but does not help many patients much (especially those who are frail, have advanced DJD, are unable to do the necessary home-based exercises,…). NSAIDs have a wide array of undesirable adverse effects, especially in the population with symptomatic knee OA, since they are typically older and have lots of comorbidities (and in this study, unlike NSAIDs, steroids were not associated with hypertension, for example)

–I also use the equivalent of 40 mg triamcinolone with 2cc of 1% lidocaine. This might have better efficacy (unknown to me) than just 1 cc of triamcinolone alone, since the added volume of the anesthetic may help the steroid reach more areas of the inflamed knee joint, and perhaps the anesthetic improves the pain relief beyond the steroid itself.

–after I have done a few knee injections, especially if there are diminishing returns (the first injections working for much longer than subsequent ones), I do discuss and recommend consideration of surgical management (usually knee replacement surgery)

–but I am certain that I will continue having patients, especially older ones, who often have serious medical comorbidities, who adamantly refuse surgery and really want repeated knee injections (even every 2-3 months) in order to function.  This study will change my practice in that I will discuss the issue of potential cartilage harm more forcefully than previously.

–one important general issue is my concern about the quick summaries of potentially clinically very important articles:  the one-line synthesis of this study was “Intra-Articular Corticosteroids Show No Benefit in Knee Osteoarthritis” in Physician’s First Watch/NEJM Journal Watch, and there was little more added in the few summary lines. I am very concerned that this type of analysis may undercut an important therapeutic modality for many patients, perhaps leading to fewer injections even though the patient may achieve very important pain relief and improved functioning/quality of life.  This brings up one of the reasons I do these blogs: we in primary care clinical practice are inundated with new articles (mostly drug-company sponsored) and new guidelines (often done by specialty societies whose members directly or indirectly are involved with drug companies, etc) on a daily basis. It is essentially impossible to keep up with the information onslaught. The summary services such as Journal Watch are really helpful in scanning the literature and alerting us to new articles/guidelines that might affect our clinical practice. But they may well have the very negative effect of dumbing down the literature to quick quips (sound bites?) that really make it impossible to figure out if a certain article or guideline really should apply to the patient sitting in front of us. My hope with these blogs is to look at a few of these articles that might well affect practice, give sufficient (and accurate) summaries of the methodology, types of patients involved, procedures done, and their results; then briefly put in my sense of how this article fits in with older literature and our model of disease physiology; and provide some specific concerns, if any, which might affect its clinical utility. This way, the reader can decide what they think about the article (or guideline), be able to review the specifics of the study​, even use my link to see the study itself for more details, and then figure out how or if they will integrate it into their practice

Primary Care with Geoffrey Modest MD: Lessons I’ve Learned From Looking at the Medical Literature

21 Nov, 16 | by EBM

By Dr. Geoffrey Modest

There have been several concerning issues and lessons that I have learned in the process of doing these blogs over the past several years (I am sending out this email/blog as a follow-up to some of the methodological issues and perhaps incorrect assumptions inherent in many clinical studies and their application to actual patients, as noted in the recent blog on placebos. See

  • Meta-analyses:
    • There is huge variability in the actual utility of meta-analyses in making clinical decisions. these analyses are mathematical concoctions which try to combine different studies with usually very different people (different inclusion/exclusion criteria, people with different levels/types of comorbidites, different ages, different ethnicities, often different doses of the med being assessed, even somewhat different outcomes measured). And the meta-analyses themselves have different inclusion criteria (minimum number of people in a study that they include, the authors’ assessment of the quality of the study). And they use different statistical analyses (e.g. some do propensity score matching as a means to control mathematically for different patient baseline characteristics; or they may use different basic statistical analyses). Also, in some cases the meta-analysis is overwhelmed by a single very large study (i.e., a meta-analysis with 10 studies, but the one with many more patients will give much more statistical weight to that one study, even if the smaller studies were actually methodologically better). As a result I have seen almost simultaneous meta-analyses on the same subject in different journals coming to different conclusions.
    • There was a really good article looking at the pyramid of the value of different types of clinical evidence (see , or Evid Based Med2016;21:125-127 doi:10.1136/ebmed-2016-110401 ) which, unlike other “evidence pyramids” in the literature over the past 20 years, dismissed meta-analyses/systematic reviews, and highlighted, for example, that study design itself (i.e. an RCT) does not necessarily mean that it is a “better” study and should be the one influencing clinical practice just because of its design, over a good cohort study (they demonstrate this by their schematic pyramid of evidence-based medicine having wavy lines separating the types of studies, instead of straight-line clear-cut separations of the value of studies by their design. and they do not include meta-analyses/systematic reviews in the pyramid). To me, RCTs are clearly limited by their exclusion and inclusion criteria, and suffer from reductionism (see prior blogs, but basically reducing “n” patients into some mathematical average of, e.g., a 53 year-old patient, 35% female, 78% white, 37% diabetic, with no renal failure and 56% on aspirin……”), and trying to apply the results to a totally different individual patient you are treating with different ethnicity, comorbidities, meds, etc.
  • Guidelines (also not included in the pyramid of the value of evidence-based medicine, above):
    • There has been an unfortunate evolution of clinical guidelines, with a few dramatic shifts over the decades:
      • The older guidelines were written by the NIH or similar governmental organization, with an emphasis on bringing in different experts both within the field and, at least to my experience, some outside of the field (e.g. clinical people), and providing a more consistent, less biased, and independent validation mechanism for the recommendations
      • Perhaps related to ideological or financial imperatives, newer guidelines are more often being channeled back from the governmental agencies to professional societies, creating a few problems:
        • Guidelines may not reach the same conclusions: e.g. the early versions of the Am Diabetic and Am Heart Assn guidelines on blood pressure goal. Then, what is a clinician to do??
        • The professional societies’ guideline-writing groups often do not include practicing clinicians (at least from what I’ve seen), but mostly the higher-ups (i.e., mostly researchers) in the professional societies. There is often a significant financial conflict-of-interest with many guideline-committee members, though this is being watched and reported more now than before, more with some professional societies than others. But, beyond those direct financial/other interests of some of the specialty society leaders, I would guess that it is not easy/comfortable for others within the societies to be critical of them (they are the “leaders”, with disproportionate influence within the writing committee and within the specialty society)
        • And there are a huge profusion of guidelines, from all of these societies, to the point that it is pretty impossible to keep up with them
        • However, I think the real reason that guidelines are not considered part of the “evidence pyramid” noted above is that there is no external validation metric used for these guidelines: there are a group of specialists sitting around a table and making recommendations about how we should treat patients, and with an inherent conflict-of-interest above and beyond those of specific leaders promoting a technique or drug which they may personally benefit from. Is it surprising that the American Urological Association has historically been much more aggressive in pushing for PSA screening? Or the American Cancer Society historically pushing for more cancer screening? Or the American College of Radiology promoting more mammograms?
        • So, the best model to me is reverting to the way guidelines used to be created, as currently done in other countries having a single uniform approach to guidelines (e.g. the NICE guidelines in the UK are pretty exemplary to me: very thoroughly researched, with, I think, pretty unbiased and thoughtful recommendations), using the best external validation metric to promote the best, least-biased recommendations based on known data and relatively unbiased expert opinion and informed by practicing clinicians. probably the best we have now in the US is USPSTF, though they also have an important-to-know filter of usually needing strong support from RCTs to really endorse an approach (e.g., see which does not recommend lipid screening in adolescents, despite what I think is pretty compelling though circumstantial evidence, basically because there are no good 30-40 year studies following 12 year-olds, randomized to diet/exercise/perhaps meds at some point, and looking at clinical outcomes).
  • Using on-line sources for quick guidance (e.g. Up-To-Date, etc.)
    • These are also not on the “evidence pyramid”, for reasons similar to the guidelines issue: the entries are the non-validated opinions of a few individuals about how to evaluate, diagnose and treat patients. There are no upfront disclosures of commercial interest (if you click on an author’s name, then on disclosures in Up-To-Date, you can get the info, but it is a few clicks away, and, I would guess that a busy clinician looking for a quick answer probably does not do this a lot. And then the information is that the author gets money from perhaps a specific drug company. And, I would also guess, most of us primary care clinicians have no idea which meds that drug company makes and therefore which suggested med in the Up-To-Date review might be promoted more…).
    • That being said, I do not know a clinician (including myself) who does not use one or more of these sources pretty often, to get quick guidance about what to do with the patient in front of them….  it is so easy, typically has a review of the relevant studies, and gives very clear guidance. The only issue is bias and reliability…..
  • Misquoting references
    • As mentioned in a few blogs, sometimes the articles misquote references, claiming incorrectly that a previous study came to a certain conclusion. So, it is useful to check the original article when an article makes a statement about another article that seems out-of-line. This is a lot of extra work, though way easier than it used to be (often you can click on a hyperlink of the reference, or do a quick online search. Easier than going to the library…)
    • Even more commonly (still not very common), articles sometime make reference to a citation which is incorrectly cited (i.e., you look at the article cited and it has nothing to do with the author’s point??An error by the author/journal editor in making sure that the citation matches??)
  • Supplemental materials
    • Oftentimes, some of the most important material is relegated to the supplemental material (including important subgroup analyses, methodologic issues, data backing up some of the article’s conclusions, conflicts-of-interest, etc.) which really give lots of insight into the real value of an intervention. These are only accessible online (an issue if you do not subscribe to that journal) and are, I think, a significant impediment for many clinicians to access. In cases where I cannot get a specific article and have emailed the author for a copy, I only get the PDF, and unless I want to pay $30-50 to get the article through the journal (which I am not), I cannot see the supplementary materials.
  • Using not-so-relevant clinical endpoints
    • There has been a trend to using composite endpoints (perhaps to make the likelihood of an intervention’s benefit higher and more likely to be statistically significant) which just don’t make sense, such as combining a really important outcome with much less important ones. For example, a recent blog looked at CPAP for OSA (see ), assessing CPAP utility for the composite endpoint of hard cardiovascular events plus the development of hypertension. If there were benefit for significant hard cardiovascular events, I would be quite inclined to suggest CPAP for my patients. But if CPAP only decreased hypertension a little (but statistically significant), I would treat that by reinforcing lifestyle changes, or using a med if needed, and would not prescribe CPAP. Or, another example: the ADVANCE study, which looked at tight blood sugar control on the effects on hard CVD outcomes plus diabetic nephropathy. This seems pretty silly. We know from many studies that tight control helps prevent diabetic nephropathy. The more important clinical issue is cardiovascular benefit or harm. And adding a known quantity of decreasing nephropathy into the “composite” endpoint just dilutes/distorts the results. This study really highlights the general issue of lumping together non-equivalent outcomes (it is hard to argue that developing early nephropathy is somehow equivalent to, and should be numerically added to, CV deaths or nonfatal strokes; or in many other studies, lumping together all-cause mortality with need for an additional clinical procedure). I raise these issues as examples, but this is really a very common finding. And this approach of combining endpoints may be worse now, since a large percent of the studies done are designed by drug companies, etc., which have a vested interest in the most positive outcome. And sometimes one cannot disaggregate the individual outcomes without access to the supplementary material….
    • As I have railed about in many blogs, I am really concerned that the FDA accepts surrogate endpoints for some clinical diseases. The most evident one is using A1C as the end-all for new diabetes meds. Personally, I don’t really care so much about the A1c, just what really happens to patients. Many of the new drugs approved do decrease the A1c (though only a little, in most cases), yet have significant and serious adverse reactions (see many blogs in ) which undercut their utility significantly (e.g., as cited in many prior blogs: rosiglitazone does well in lowering A1C, just unfortunately increases cardiac events…)

So, I am writing this blog mostly because I have been doing these blogs for several years now, have been reading lots of articles, have the (perhaps) benefit of seeing the evolution over decades of clinical research and the medical-political-social-economic structure of both the research being done and how it is reported, and am pretty frequently struck by some of the not-often-acknowledged gaps and concerns of that literature and its effect on clinical practice. I would recommend reading the “evidence pyramid” article in the BMJ Evidence-Based Medicine journal referenced above, since it does comment a bit on some of these (and did stimulate me to write this). But, of course, I should also comment that all of the above are my observations (i.e., not validated by an independent group), but at least I have no (i.e., zero) conflicts of interest, other than the bias to a real skepticism in reading articles and guidelines, or of being an early adopter of new meds/procedures…..

Primary Care Corner with Geoffrey Modest MD: Glucometers Lower A1c’s in Non-Insulin Using Diabetics, a Little

31 Oct, 16 | by EBM

By Dr. Geoffrey Modest

BMJ just published a meta-analysis of randomized controlled trials (RCTs), finding that non-insulin using diabetic patients who self-monitored their blood sugars had improved glycemic control (see doi:10.1136/bmjopen-2015-010524 ). This analysis included several new studies, not available in prior reviews.


  • 15 RCTs were identified with 3383 patients
  • Results:
    • Those using SMBG (self-monitoring of blood glucose) had:
      • Lower HbA1c by −0.33 (−0.45 to −0.22); p<0.001 [the quality of evidence was rated as moderate]
      • Lower BMI by −0.65 (−1.18 to −0.12); p=0.02 [the quality of evidence was rated as low]
      • Lower total cholesterol (TC) by −0.12 (−0.20 to −0.04); p=0.003 [the quality of evidence was rated as high]
      • Lower waist circumference by -2.22 (-4.40 to -0.03); p=0.047 [no comment, but i assume that is in centimeters; the quality of evidence was rated as moderate]
      • No significant difference in fasting plasma glucose, systolic or diastolic BP, HDL, LDL, triglycerides, or weight
      • Subgroup analyses: no difference if Asian countries or US/Europe; A1C was improved in both short-term (<6 month, by -0.36%) or long-term studies (>12 month, by -0.28%). BMI and TC changes were only significant in the <6 month group. and though waist circumference was improved overall, it was not significantly improved in the subgroups, but was near-significant (p=0.06) only in the >12 month group (by -3.15); also similar A1C reductions were found in patient with newly diagnosed type 2 diabetes (T2DM) vs duration >12 months [no further analysis for really long-termers]; SMBG was significantly more effective in patients with lower A1C (<8%) vs higher
    • Adverse events: most common was the incidence of hypoglycemia (higher in SMBG group), though their rate (episodes/patient) was lower


  • Prior concern about SMBG reflect its high cost (21% of diabetic prescription costs in the US) and several studies suggesting its lack of efficacy in non-insulin using T2DM patients (e.g. Farmer AJ BMJ 2012;344:e486), even though currently 63.4% of T2DM use SMBG daily
  • The analysis, as with pretty much all meta-analyses, is limited by the quality of the studies included, their size, differences in methodology in general, degree of patient education, frequency of testing, and inherent biases associated with the more intensive medicalization in those doing SMBG
  • The decrease in A1C of -0.33% is often not considered to be clinically significant (typically defined as a change of 0.5%)
  • So, this study does suggest efficacy of SMBG monitoring, albeit perhaps of marginal clinical significance. As an intervention, it does medicalize patients much more than just taking a pill. And this has the potential for both positive and negative effects: the positive side is that it may empower patients in involving them more in taking ownership and treating their condition, and for some patients, this involvement might be important in helping them deal psychologically with a potentially devastating disease; the negative side is that for some patients it might create lots of anxiety and perhaps over-focus/dwelling on their medical problems and perhaps reinforce a more passive, “sickness” mentality which could decrease their ability to function.
  • This last difference exposes one of the contradictions of RCTs: they look at a large group of individuals, with some exclusions, but cannot really replicate the actual patient one is treating. It may well be that some patients who want to control their bodies and illnesses more, actually do much better with SMBG than decreasing their A1C by the 0.33% as above, perhaps using the daily blood sugar feedback as a motivation for more lifestyle changes (and, even if the A1C does not plummet, these lifestyle changes might have much broader healthful consequences). Others who may become more anxious or are not interested in this level of involvement, may get no benefit, or the experience might actually be negative. And the sum of these patients in the larger RCTs may then reveal only a mediocre outcome, obscuring the potential benefit for perhaps a lot of people. The real trick might be to figure out who is motivated by the SMBG and use this tool to help them with their diabetes management. And perhaps not using or stopping SMBG in those who do not really benefit. So, yet again, one size just does not fit all….

Primary Care Corner with Geoffrey Modest MD: Evidence Based Medicine — What are its limitations?

2 May, 16 | by EBM

By Dr. Geoffrey Modest

The journal Evidence-Based Medicine (that’s the one that posts my blogs, part of BMJ) just came out with an interesting article challenging the biases inherent in evidence based medicine (EBM) which ultimately can distort the conclusions (see Seshia SS, et al. Evid Based Med 2016; 21: 41). They reference a 2014 BMJ analysis of EBM, noting its pluses and minuses (see Greenhalgh, T. BMJ 2014;348:g3725). The pluses are that EBM has been around for 20 years, has led to the development of more evidence-based reviews such as Cochrane Collaborations, as well as a slew of guidelines more based on specific methodological scientific criteria, and in many ways has elevated the basis for conducting more rigorous studies. But there are several minuses which are important to understand in order to interpret the results. Per the 2014 article (with some of my comments embedded):

  • Distortion of the evidence based brand: by this they mean that drug and medical companies have played such a pivotal role in designing research studies that they are able to push using surrogate markers as the important outcome (e.g. A1C in diabetics), define inclusion and exclusion criteria to best show efficacy (though these may really undercut the applicability of the results to regular old patients), and selectively publish positive studies
  • Too much evidence/too many guidelines: citing a study from 2005 (when there were many fewer guidelines than now), in a 24 hour period they admitted 18 patients with 44 diagnoses; to read the national guidelines on these diagnoses included 3679 pages and an estimated reading-time of 122 hours. And, I would add further that these guidelines may well be inconsistent with each other (e.g. different blood pressure goals in the American Diabetes Assn vs JNC8 guidelines).
  • Marginal gains: most of the major therapies have been found (low-hanging fruits), e.g. HIV drugs, H Pylori treatment, statins. Newer trials are often overpowered, allowing them to find statistically significant findings which are not very clinically significant. And, I would add: these studies are often pretty short or stopped early, showing small absolute benefit but too short to pick up longer term harms of therapy
  • Overemphasis on following algorithmic approaches: by overemphasizing specific targets (e.g. A1C in diabetics), clinicians may not pay enough attention to the really important patient issues (the depression, domestic violence, important social or other medical issuesin the patients’ lives). And incentivizing these mechanical issues (ordering A1C’s) or dealing with pop-ups or care prompts in electronic medical records, may undercut our ability or time spent to really help the main problems of patients
  • Poor fit in those with multimorbidity: many of these EBM studies were done in patients with predominantly one condition (e.g., by excluding those with renal failure, cancer, etc.). Taking care of patients with multiple ongoing diseases leads to several issues not addressed in the studies: e.g., drug interactions, or polypharmacy (especially an issue as our patient populations are getting older and getting more chronic diseases)
  • The 2016 EBM journal article expands this and develops more of a framework to understand the cognitive biases in the medical literature, noting that there may well be combinations of biases in any article. They group biases as follows:
    • Conflicts of interest:
      • Financial, nonfinancial (e.g. desire for promotion, prestige), and intellectual (driven by strong personal belief that could distort the study)
    • Individual or group cognitive biases:
      • Self-serving bias (affected by group/organizational motives), confirmation bias (favoring evidence that supports one’s preconceptions), in-group conformity (increased confidence in a decision if in agreement with others, similar to groupthink, where opposing views are discouraged), reductionism (reducing complex or uncertain scenarios into simple ideas and concepts; see further comments below), automation bias (uncritical use of statistical software, decision support systems)
    • Group or organizational cognitive biases: scientific inbreeding (being trained in the same school of thought or by the same experts), herd effect (unquestioned acceptance of experts; reinforced by social media)
    • Fallacies/logical errors in reasoning: planning fallacy (incorrectly estimating benefits or costs/consequences), sunk cost fallacy (inability to change course of study despite problems, after so much has been invested)
    • Ethical violations: ranging from subtle statistical manipulations, selective publication, outright fraud/fabricatrion. There is typically an associated rationalization and self-deception.

So, a few issues:

  • These articles do bring up many of the concerns about EBM, despite the rather large positive of its push to make both the literature and its interpretation more rigorous. Most of the negatives are about inherent biases in designing and conducting studies but also in about being able to apply the results to the individual patient in front of you.
  • One additional point is that, as the rising tide lifting all the boats, EBM-based guidelines also elevate “expert opinion”. By this I mean that since we do not have rigorous studies looking at most of the things we do in primary care (or, clinical medicine, for that matter), the guidelines have a lot of expert opinion. It is certainly true that there is a very clear and repeatedly articulated grading system in the reviews/guidelines reflecting the quality of the studies, but often the take-home message is muddled, combining more definitive and not-so-definitive conclusions all together (i.e., many of the subtleties are lost. We remember the target points highlighted in their conclusions or a take-away-message box, which are typically of highly varying quality). And, to make matters worse, a large % of the “experts” are under the drug/medical supply company wings, much more so in the past 20 years of EBM, so there is increased concern about their “expertise”.
  • One interesting sideline here is the general approach of medical studies vs anthropologic studies (this comes from a long-lost article I read in the 1980s), which noted that medical studies were fundamentally reductionist: looking at lots of people and averaging their individual characteristics, so that, for example in the ASCOT-LLA lipid study (Sever P, Lancet 2003; 361: 1149), a 63.1 yo person, 94.6% white, 18.9% female, having a 24.5% incidence of diabetes, blood pressure of 164.2/95.0, with a median LDL of 212.7, but excluding those with “clinically important hematological or biochemical abnormalities”, has a 36.0% lower relative risk of developing heart disease after 3.3 years on atorvastatin 10mg (and, of course, we will never see that person, and it is in fact a long and tortuous ideological and practical leap to apply these results to the individual in front of us) vs the anthropologic approach of studying a few families intensively over 1-2 years and, by really getting to know and understand them, to generalize these findings to develop larger conclusions about culture. If you ever get a chance to read some of the really old journals from the bowels of large medical school libraries, many of these medical articles were much closer to the anthropologic approach (detailed case studies of a few patients with a particular clinical presentation). Clearly there are advantages and disadvantages of both of these approaches in terms of understanding disease and treatments, especially given our rather limited understanding of the complexity of the human biological/psychosocial systems and their interactions, though EBM aggressively promotes the “reductionist” method. [The clinical case presentations in several of the major medical journals does promote the concept of applying what we have learned in the big studies to individual patients. perhaps this approach should be fostered more, though with experienced clinicians with zero ties to drug companies, etc….]
  • But the concept here is: one should be critical of the medical literature, looking carefully at the study design, inclusion/exclusion criteria, funding sources, and, to the extent we can, assess the likelihood of these underlying biases in distorting the conclusions.

Primary Care Corner with Geoffrey Modest MD: Adult Depression Guidelines

26 Feb, 16 | by EBM

By Dr. Geoffrey Modest

The American College of Physicians released a clinical practice guideline on nonpharmacologic vs pharmacologic treatment of adults with major depressive disorder–MDD (see file:///C:/Users/geoff/Downloads/AIME201603010-M152570.pdf ).



  • MDD, defined as in DSM-V: depressed mood or loss of pleasure or interest, along with other symptoms (changes in weight or appetite, insomnia or hypersomnia, psychomotor agitation or retardation nearly every day, fatigue or loss of energy, feelings of worthlessness or excessive or inappropriate guilt, indecisiveness or decreased ability to concentrate, and recurrent thoughts of death or suicide), lasting at least 2 weeks and affecting normal functioning
  • MDD has estimated lifetime prevalence of 16% in the US
  • 8 million ambulatory visits/year
  • Estimated economic cost to society and health care in 2000 was $83.1 billion (and probably higher today)
  • Definition of treatment response: a decrease of at least 50% in one of the tools [Patient Health Questionnaire-9 (PHQ-9) or Hamilton Depression Rating Scale (HAM-D); for meds, they only looked at second-generation antidepressants (i.e., not tricyclics, MAO inhibitors, which, by the way, are probably as effective in a few trials, but have more adverse effects and discontinuations); they also assessed nonpharmacologic approaches: psychological (acceptance and commitment therapy, cognitive behavioral therapy CBT, interpersonal therapy, psychodynamic therapy), complementary and alternative medicine (CAM) approaches (acupuncture, meditation, w-3 fatty acids, S-adenosly-L-methionine SAMe, St John’s wort, and yoga) and exercise


  • ​Meds vs CBT: no difference in outcome comparing the two after 8-52 weeks of treatment, for remission rates or functional capacity (mod quality evidence, 5 trials); 2 trials did not find benefit from combo therapy for remission after 12-52 weeks of therapy (low-quality evidence​)​, though perhaps some benefit in functional capacity.
  • Meds vs interpersonal therapy: no difference in response (3 trials, low-quality evidence​); no real evidence of combo therapy (only 1 low-quality trial which used nefazodone as the med).
  • Meds vs psychodynamic therapies: no difference for remission or functional capacity (3 trials, low-quality evidence).
  • Meds vs acupuncture: no diff comparing fluoxetine vs acupuncture after 6 weeks (1 trial, low-quality evidence​); combo fluoxetine or paroxetine with acupuncture found improved response after 6 weeks (2 trials, low-quality evidence​)
  • Meds vs w-3 fatty acids: meds better than w-3 fatty acids (1 trial, low-quality evidence​)
  • Meds vs SAMe: no diff in study with escitalopram (1 trial, low-quality evidence​)
  • Meds vs St John’s wort: no diff from 9 trials (low-quality evidence ​because meds not used at usual therapeutic dosage range. Other issues include non-regulation of St John’s wort by FDA and variable potency, important drug-drug interactions by inducing cytochrome P450 isoenzyme 3A4)
  • Meds vs yoga: no trials done
  • Meds vs exercise: no diff in response, including 2 trials with moderate quality of evidence for sertraline vs exercise after 16 weeks
  • ​Switching meds: no difference in response rate by switching from one med to another. Mod quality evidence [but very few studies evaluated: switching from bupropion vs sertraline or venlafaxine and sertraline vs venlafaxine], no diff in adverse events/discontinuation rates
  • Switching from med to different med vs switching to CBT: no difference, but 1 study with low-quality evidence
  • Augmenting one med with another: no difference in augmenting citalopram with bupropion vs buspirone (though adding bupropion decreased depression severity more than buspirone (1 trial with low-quality evidence)
  • Augmenting med with another vs augmenting with CBT: no difference in response, remission, depression severity if augment citalopram with buspirone or bupropion vs augmenting with CBT
  • ​In terms of harms overall: pretty mixed. Some trials with more discontinuation with meds vs psych therapies. Not much difference with (though acupuncture and St John’s wort did have fewer adverse events than meds)

Their conclusion: offer either CBT or med for patients with MDD after discussion treatment and adverse effects (strong recommendation; moderate quality evidence)

So, I’m not sure what to make of this. It is pretty clear that the studies evaluated did not create a basis for strong recommendations: in general only very few studies were included (reflecting the paucity of strict RCTs) and the majority had low-quality evidence. And several common management strategies were dismissed because of no formal studies being done (e.g., using the same med to retreat a person with depression who had previously responded to that med).

A few points:

  • There are several other common treatment strategies that were dismissed for lack of higher quality evidence, where in fact there are some supportive data:
    • Switching from one med to another when the first one does not work. I have seen a few studies (though do not have the reference handy) finding that switching SSRIs from one to another in nonresponders increases the response rate from a baseline group response rate to about 15-20% higher with a different SSRI. And, my clinical practice of trying one and, if no response, switching to another has been reasonably effective. If there is a partial response to the initial SSRI, I typically try either increasing the dose or augmentation (as in next point).
    • Adding an augmenting med to an antidepressant when there is suboptimal initial response. Again, the data are not great, and a systematic review overall did not find benefit for augmentation (though 2 or the 5 RCTs did). On the other hand, an impressive and pretty large trial (not a clean RCT) of patients with suboptimal response to citalopram did find benefit for augmentation with either bupropion or buspirone (somewhat better with bupropion) – see NEJM 2006; 354: 1243. And my personal experience pretty strongly supports augmentation with bupropion in those with partial responses to an SSRI.
    • Combining a med with psychotherapy. Several studies have confirmed an augmented effect of using this combined approach: e.g. World Psychiatry 2014; 13: 56 — a large meta-analysis of patients with MDD finding a clinically meaningful effect of combination therapy over meds alone.
  • Perhaps the biggest issue with these guidelines is the limitation of randomized controlled trials (RCTs) themselves, in terms of their generalizability to the patient sitting in front of you:
    • Structural issues: the RCT patients may be predominantly of a different gender, ethnicity, or have different comorbidities than your patient (and even in the best large RCTs with representation of many different types of people, any subgroup analysis looking at the factors most reflective of the patient you are treating are typically post-hoc analyses, which limit their statistical validity by introducing potential biases)
    • ​Exclusion criteria: RCTs have upfront exclusions which make the study data cleaner and easier to analyze; e.g. patients with cancer or renal failure, etc. are excluded (because of limited life expectancy, confounders with meds taken, other medical issues, etc.). But we still need to treat patients with these conditions…. Does the RCT really apply to them?
    • Inclusion criteria: a study may well find that meds vs psychotherapy are equivalent. But in order to be part of the study, those patients recruited must agree to be randomly assigned to either wing of the study, prior to randomization. but many patients (at least many of mine) are not good candidates for psychotherapy (not willing to go, too little insight for therapy to be useful, etc.), so there is a selection bias in terms of who participates in the RCT, and there may be real differences between those patients who would participate in a study and those who would not (i.e., their depression in the setting of who they are may respond differently to meds, for example).
  • Real-world applicability of RCT results: primary care providers do not have accessible study nurses who call the patients regularly, see if there are any problems, make sure they are taking their meds and perhaps do pill counts, make sure they make it to their psychotherapy appointments/etc., and have the drug company sponsors pay for all of the copays, transportation costs, and give the patient financial incentives to adhere to the protocol. Our real world patients may have little of these benefits, and may respond to treatments differently than the study patients as a result.
  • Placebo effect: an assumption in RCTs is that they are trying to prove that there is an incremental value of a new med, for example, over placebo. But maybe the placebo effect is clinically important???  There may be no trial showing that choosing a med based on either the patient’s prior success or that of a family member leads to a higher likelihood of success in the patient in front of you. BUT, first of all, there may be lack of recommendations about this just because the study was never done (i.e., there actually may be a benefit in choosing an SSRI based on this, we just don’t know). AND, even if there is a placebo effect, such that there is an increased response if the med worked before or in a family member, that’s clinically important and in the patient’s interest…..
  • So, this is not to say that there is no real use or even real importance of RCTs, just that there are limitations to their generalizability. And in the above case of depression, both the lack of studies to answer important questions and the assumption that we need to minimize the placebo affect should not necessarily undercut the applicability of treatments. Perhaps the main points of the guidelines are that there is reasonable equivalence overall to meds and psychotherapy (esp. CBT) overall, but that for such a really common problem as MDD, there are embarrassingly few high quality studies addressing the pressing clinical issues we see day-in and day-out…

Primary care corner with Dr. Geoff Modest: scientific “objectivity”

3 Feb, 14 | by EBM

sobering article in NY Times yesterday exploring the subjectivity of science (see ).  we all know intuitively that behind the mystique of the rigors of science and the dispassionate quest for knowledge, there is a large human incentive to create the results we hope to find. (this is beyond the greed motive, wherein a drug company withholds negative results, as with gabapentin, for their own profit).

the authors point out, for example, a report in Nature that only 6 out of 53 “landmark” studies in cancer research were able to be replicated.  and, it certainly seems that more-often-than-not, initial reports on a new drug’s efficacy are either diluted or totally negated on subsequent studies.

this lack of “objectivity”, or course, does not just apply to science. they comment on it in literary criticism, note that Jane Austen took on this issue explicitly in Price and Prejudice, and that Austen was perhaps consciously putting her characters in challenging situations to analyze how they would act strategically (predating the formal current discipline of game theory).


Thousands of American Men to Die this year because of Evidence-Based Medicine (EBM)?

22 Oct, 11 | by EBM

Does Belief-based medicine trump EBM? breast and prostate cancer screening recommendations redux

It has been an exasperating couple of weeks and there will be more to come. I got my coffee and sat down, thinking I could rest a bit and gather strength for the day while a grand rounds speaker droned on. But then it happened—the radiology professor began to build her case for screening all women 40-49 years of age for breast cancer.  She spoke as if her case was airtight. She was about to be surprised by audience reaction. The US Preventive Services Task Force (USPSTF) had gotten it wrong she implied, and we are still reeling from it two years later. They shouldn’t have changed their recommendation.

In 2009, the USPSTF revised its recommendation.  Here it is (

“The USPSTF recommends against routine screening mammography in women aged 40 to 49 years. The decision to start regular, biennial screening mammography before the age of 50 years should be an individual one and take into account patient context, including the patient’s values regarding specific benefits and harms. (Grade C recommendation)”

“C” means: “There may be considerations that support providing the service in an individual patient. There is moderate or high certainty that the net benefit is small.”  It does say “recommends against” but also says to consider values about benefits and harms.  Seems reasonable doesn’t it?

The speaker went on: Since screening was implemented breast cancer mortality has declined. Screening detects early stage tumors, those we can treat before they get worse. And young women get breast cancer, the really bad aggressive kind. At our hospital in the past 3 years, some 79 women in that age group were diagnosed with cancer. And the USPSTF doesn’t think we should screen them all. These are young women with families and jobs. They don’t even want us to recommend women do self-exams. Unbelievable.

A few colleagues rumbled. They asked “Was the decline in mortality due to screening?” “Were the 79 detected by screening?” Fair questions but peripheral, I thought. I felt it coming. I boiled over with “What proportion of women screened versus not screened die from breast cancer? Shouldn’t that be the starting point for the discussion?” The conference ended without an answer though impressive sounding relative risks were presented.

A particularly insightful colleague pointed out that while we might want to know the numbers I asked for, that patients might have trouble understanding them, and more importantly, even when they did understand, they had little context for decision-making. They might be anxious and just want to know what they should do.  “What would you do?” or “What do you think I should do?” they would ask their doctor.  I had gathered strength for the day, not by snoozing through a lecture but by being energized about the battles between evidence-based medicine and whatever the alternative is.

This was the week after the USPSTF had downgraded its prostate cancer screening recommendation to recommending against it (a Grade D recommendation: means they discourage use of the test because there is moderate to high certainty that it has no net benefit or that harms outweigh the benefits).

A US presidential candidate condemned the recommendation.  And a CEO of a group devoted to ending prostate cancer said that the recommendation “condemns tens of thousands of men to die this year and every year going forward…”

Seriously? People who spent months to years reviewing the best evidence did that?  Here is the evidence: One trial found a 0.07% decrease, another a 0.03% increase, and a meta-analysis, no difference in prostate cancer deaths attributable to screening. And yes, those zeroes and decimals are in the correct places. And harms abound.  There was a voice of reason in a physician-written editorial to the New York Times that addressed both breast and prostate cancer screening.

So what is going on?
Belief is powerful. We believe in pathophysiology, and well we should.  Early stage tumors become late stage tumors.  Sometimes early stage tumors can be removed and the later stage ones prevented.  Though inconvenient, it also turns out that in some cancers (eg breast) early stage tumors are manifestations of an already systemic disease, and in others (eg prostate), tumors may not cause harm during the patient’s lifetime. Screening might not help those.

Memory is powerful (the availabity heuristic). We remember the young man or woman who died a horrible cancer death. It was a cancer we had a test for. And we hadn’t tested for it. Their family might complain that they hadn’t been tested. A colleague pointed out to me that no patient had ever complained about getting screened.  Even when they suffered all of the downstream consequences (eg surgical complications, erectile dysfunction, incontinence) they were grateful for having been saved (of course one doesn’t know whether he would have been saved without intervention or complications).

The ecology of health care first described in the 1960s by Kerr White continues to be ignored. Most people do not end up in the care of hospitals or specialists. Yet those who view healthcare through a tertiary care lens are often the most respected, influential and vocal promoters of what should be done for populations and patients in primary care settings, where they have no expertise.  (others with no discernable expertise in this arena include politicians, yet they speak loudly). The USPSTF starts its recommendation process with a systematic review of the evidence. Experts such as those on the USPSTF do us a great favor by examining the evidence underlying preventive care practices from the perspective of populations and patients without disease.

Finally, there is the belief that doing something is always better than doing nothing.  Suffice it to say that isn’t true. But doctors, patients and people have a hard time with it. It is tiresome to keep hearing that a common bad disease is sufficient information to justify doing something, even when evidence is absent it works, or even worse, when evidence finds harm. “How can you just sit there doing nothing?”

So what should we do?
Despite complaints about evidence-based medicine, I have yet to hear good arguments about why we should ignore high quality evidence or not be explicit when there is an absence of evidence. And like democracy, evidence-based medicine may not be perfect but it is the best we have. Basing clinical practice primarily on pathophysiology, belief and the most recent bad outcome is not a viable alternative.

But while evidence can be very informative, we have a long way to go towards getting it understood and easily used.  We need to communicate the evidence accurately and dispassionately to patients in ways that it is accessible to them (which may vary by patient) and at times when they can be receptive.

At age 40, a woman’s chance of dying from breast cancer is 3% (5-year risk if no risk factors is 0.4%, and up to 6% or more if risk factors are present). Screening during ages 40-49 reduces that chance by 15% (95% credible interval 1%-27%), to 2.55% (a 0.45% decrease and up to 56% of women will experience a false positive. Because not everyone adheres to screening, the number needed to screen to prevent a breast cancer death even in clinical trials is 1904 (see USPSTF evidence report; google it). A model for communicating this type of information visually with patients can be found here

For prostate cancer, the lifetime risk of dying from it is 2.8%, most after age 75. The benefit of screening remains uncertain but one trial puts it at a 0.7% absolute risk reduction for prostate cancer specific mortality (eg to 2.1%); 80% of PSA tests are false positives (and that doesn’t consider the fact that most actual cancers wont lead to death), 20% will get a biopsy (in 10 years of screening), and 20-30% of men treated for prostate cancer with surgery or radiation will have incontinence or erectile dysfunction.

We also need to offer patients our best medical judgment based on that evidence, taking into account what we know about them—best done by a clinician who has comprehensive knowledge about the patient and is in a longitudinal trusting relationship with them. Easier said than done, I know.

“OK, OK. Whatever,” you might say, “but what will you do?” Hopefully you don’t really care what I would do, at least not as a guide for your practice. But, I’ll tell anyway.

Some of my patients want me to prove they don’t have disease. I tell them I cant do that but they know there are tests that can detect diseases. These patients often think that when I send a blood test, I am checking them for “everything.” I will continue to explain to them that it doesn’t work that way, but in the end, so long as they understand the consequences of testing, I will order the mammogram for the 40-year-old. Prostate specific antigen testing I will reserve for those who are otherwise un-persuaded after I inform them the test likely causes more harm than good and is not recommended, and they request and consent to testing; but I can still order it in those rare cases with a good conscience and consider it good medical practice because I know some cancers will be detected and cured only as a result of testing, even if the odds are strongly against it being the case for this one individual.

I have other patients who say “que sera, sera” or “if it ain’t broke, don’t fix it.” They are not convinced that screening is worth it (after being presented with data in an understandable way). Respecting their values and autonomy, I wont test them.

The patient in the middle ground is the most difficult and perhaps most common.  For prostate cancer I will follow the USPSTF guidance against screening. But I will provide numerical information to patients when they are interested. For breast cancer, I will have a low threshold for offering screening to women age 40-49 years as long as they understand the 50% or so chance of false positives (and attendant anxiety, repeat imaging, biopsy and other procedures unnecessary only in retrospect) and still deem it worthwhile to them.

This past week the USPSTF released its recommendation that cervical cancer screening be started later (age 21) and be done every three years (or not at all after smears are negative 3 times in a row). There is another presidential debate today in the US; probably this will come up. It’s going to be another energizing week…

EBM (the practice and the journal) takes a ribbing

11 Jul, 11 | by EBM

A writer at the Boston Globe is annoyed by the terms “evidence-based medicine” (and “reality-based community” and “fact-based presidency,” among others that he calls verbal tics).  Surely these terms have become overused. But they have become overused because people want to base decisions in evidence.  But the reporter scoffs at the BMJ (and the journal EBM) by quoting the long-accepted (since around 1992) definition of EBM and mocking it.

About “evidence-based medicine” he asks, “As opposed to what?”, making the same mistake many learners make when they first hear about EBM. He believes the practice of medicine must all be evidence-based and is unaware that anything else could go on, or that it might be complicated to identify and apply evidence. Clearly EBM (the practice and the journal) is about using the best evidence.

Anyway, the reporter’s piece and my response can be seen here…as per my tweet earlier today “EBM and BMJ taunted by Boston Globe writer last week. EBM responds…See it in the Globe today ” and follow me @EvidBaseMed_BMJ

EBM blog homepage

Evidence-Based Medicine blog

Analysis and discussion of developments in Evidence-Based Medicine Visit site

Creative Comms logo

Latest from Evidence-Based Medicine

Latest from EBM