StatsMiniBlog: Bootstrapping

As mentioned in previous posts, part of the joys of playing with numbers is in making inferences about how the future will be. This is often why you’re looking for a confidence interval, showing (sort of) where the truth lies with 95% certainty.

How you build these CI is interesting.

One way is to assume a Normal distribution exists, and then the sample size and sample mean will give you the CI (using the standard error of the mean). If you’re not quite sure that it really is a Normal distribution, then you have a range of opportunities, one of which is a bit 50 shades of starts: bootstrapping.

The bootstrap procedure works by taking the data, and the making new versions of it, and calculating the mean (or whatever) in the new sample. It then counts up all these ‘made up’ means to say where 95% of them lay … creating a confidence interval which more closely reflects the truth than assuming a particular distribution.

The procedure creates a series of ‘new’ datasets from rows re-sampled from the data at random, allowing any patient-line-of-data to enter the new bootstrapped dataset multiple times. This is based on the principle of random sampling reflecting the true values of the item within a population, and simulates the expected random variations that will appear in when further studies – or real life – occurs.

If you spot this technique, you can be a bit more convinced in the confidence intervals displayed than those from other methods.

– Archi

(Visited 167 times, 1 visits today)

BMJ Blogs