But the question arose when I started to look at this paper published in the Archives, addressing the question of observer variation in clinical assessment of wheezy kids. Mostly, I think wheeze = mediastinal mass (fast onset -> T-cell lymphoma, slow onset -> Hodgkin’s lymphoma) or wheeze = aspergillus infection, if hot & leukaemic, but I do recognise that asthma is, occasionally, the correct answer.
But how breathless are wheezers and do different clinicians agree?
The study took a group of 27 patients who turned up at the ED with wheeze, before and after bronchodilators, and video recorded them (with sats monitor attached). They then showed the videos twice, in random order, to experienced (>5 years) paediatric consultants and nurses and asked them about breathlessness things, like wheeze, accessory muscle use, flaring and retractions.
The way they looked at the agreement is interesting and well explained in the paper – they examined how much the scores each person gave to a video (say Fred-pre-bronchodilator) differed when that observer saw it twice. That then gives a mean difference in scores for Fred-pre-bronchodilator[oberver1], and then calculated the spread of mean difference across all that observer’s estimates. Using, you can calculate what the standard error of that assessment is (and so, approx, twice that standard error gives you the variation you’d be thinking could be seen by just how different that measurement can be by chance alone).
If you take those measurements across the group, you then get to the ‘chance’ variations between observers – not just within the one.
Which leads on to two very important ideas – as the authors say
the smallest detectable change (SDC), the smallest within-person change which can be interpreted as real change above measurement error.
and alongside this
the minimal important change (MIC), the smallest change in the measurement which the clinician or patient perceives as important
Now if your minimum important change is WITHIN the size of the smallest detectable change you’ll not be able to be convinced that the change you see is real, or just measurement error.
To massively switch ideas, imaging taking a photo of labrador from 6 metres using and iPhone camera. Does the dog have whiskers or have they been lost in a BBQ/sausage/unavoidable-hunger incident?
Even if you could expand the image (and you might well have done this) the resolution – the ‘smallest detectable change’ is too rubbish to see the whiskers. Or absence of whiskers. The SDC exceeds the MIC.
What the paper (not the dog) cleverly showed was that the variation in breathlessness measurement was hugely greater than the minimally important change (SDC ~ 3, MIC ~1) implying that only really big changes (over 3 – when meaningful = 1 or more) could be convincingly detected.
(I can’t see that the SDC for an individual clinician is presented, rather the interobserver SDC, so I can’t be entirely sure that as the agreement within clinicians was much greater than between, that this is as true for a single person assessing and re-assessing.)
The implication is that we need to have the same person assessing, or at handover doing it in front of the patient and saying “what you see now is same/worse/better than before”.
But the stats suggest that the patient might be getting better and we won’t spot it, or we’ll make assessments that show improvement that are actually just the fluttery breaths of chance …
ps – the dog DOES have whiskers and the BBQ was cold when sausages were stolen