Propensity scores are used mainly in observational studies assessing treatments as a way of balancing out measured variations in who received a treatment and who didn’t.
In most observational studies, there are things which will have pushed the doc into prescribing the medicine in question, or the surgeon to take the knife to that patient and not the other one. These factors (baseline characteristics, if you’d like to be fancy) might well be linked to the outcome, as well as linked to the decision to treat. For example, if you are looking at if antihistamines are helpful in chickenpox itch, it may be that those with a greater number of spots were more likely to be prescribed chlorpheniramine, and that the spot count is also related to the amount of reported itch.
What a propensity score attempts to do is to ‘balance’ the data into groups – often five of them (quintiles) – who contain equal distributions of the baseline characteristics (like spottyness, gender, number of sibs …)
This score can then be used to match up treated and untreated patients, or used to create five different treatment effectiveness estimates across the different strata, or adjust a regression analysis, but essentially used to make a guess at the real effectiveness of the treatment and ‘correct’ for measured biases.
(You’ll note that this has no way of balancing up unmeasured biases, and probably guessed that if the measurement tools are coarse – for example saying “severity of illness was measured as ICU admission vs. not” – then there may remain unaccounted biases.)
While propensity scores can help in making fairer comparisions across some observational data sets, they are not a way of avoiding RCTs. So if you want to know if you should give piriton to the itchy pox-filled children in your life, you really should be enrolling them on an RCT.
– Archi