You don't need to be signed in to read BMJ Blogs, but you can register here to receive updates about other BMJ products and services via our site.

Data visualisations

p values misused

8 Mar, 16 | by Barry Pless

Don’t ask me why but I follow Retraction Watch faithfully. Recently there was a posting about p values I thought would be of interest to our readers and contributors. Here it is verbatim.

“We’re using a common statistical test all wrong. Statisticians want to fix that.

After reading too many papers that either are not reproducible or contain statistical errors (or both), the American Statistical Association (ASA) has been roused to action. Today the group released six principles for the use and interpretation of p values. P-values are used to search for differences between groups or treatments, to evaluate relationships between variables of interest, and for many other purposes. But the ASA says they are widely misused. Here are the six principles from the ASA statement:

P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Proper inference requires full reporting and transparency.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
We spoke with Ron Wasserstein, ASA’s executive director, about the new principles.

Retraction Watch: Why release these “six principles” now? What about this moment in research history made this a particularly pertinent problem?

Ron Wasserstein: We were inspired to act because of the growing recognition of a reproducibility crisis in science (see, for example, the National Academy of Sciences recent report) and a tendency to blame statistical methods for the problem. The fact that editors of a scholarly journal – Basic and Applied Social Psychology — were so frustrated with research that misused and misinterpreted p-values that they decided to ban them in 2015 confirmed that a crisis of confidence was at hand, and we could no longer stand idly by.

Retraction Watch: Some of the principles seem straightforward, but I was curious about #2 – I often hear people describe the purpose of a p value as a way to estimate the probability the data were produced by random chance alone. Why is that a false belief?

Ron Wasserstein: Let’s think about what that statement would mean for a simplistic example. Suppose a new treatment for a serious disease is alleged to work better than the current treatment. We test the claim by matching 5 pairs of similarly ill patients and randomly assigning one to the current and one to the new treatment in each pair. The null hypothesis is that the new treatment and the old each have a 50-50 chance of producing the better outcome for any pair. If that’s true, the probability the new treatment will win for all five pairs is (½)5 = 1/32, or about 0.03. If the data show that the new treatment does produce a better outcome for all 5 pairs, the p-value is 0.03. It represents the probability of that result, under the assumption that the new and old treatments are equally likely to win. It is not the probability the new treatment and the old treatment are equally likely to win.

This is perhaps subtle, but it is not quibbling. It is a most basic logical fallacy to conclude something is true that you had to assume to be true in order to reach that conclusion. If you fall for that fallacy, then you will conclude there is only a 3% chance that the treatments are equally likely to produce the better outcome, and assign a 97% chance that the new treatment is better. You will have committed, as Vizzini says in “The Princess Bride,” a classic (and serious) blunder.

Retraction Watch: What are the biggest mistakes you see researchers make when using and interpreting p values?

Ron Wasserstein: There are several misinterpretations that are prevalent and problematic. The one I just mentioned is common. Another frequent misinterpretation is concluding that a null hypothesis is true because a computed p-value is large. There are other common misinterpretations as well. However, what concerns us even more are the misuses, particularly the misuse of statistical significance as an arbiter of scientific validity. Such misuse contributes to poor decision making and lack of reproducibility, and ultimately erodes not only the advance of science but also public confidence in science.

Retraction Watch: Do some fields publish more mistakes than others?

Ron Wasserstein: As far as I know, that question hasn’t been studied. My sense is that all scientific fields have glaring examples of mistakes, and all fields have beautiful examples of statistics done well. However, in general, the fields in which it is easiest to misuse p-values and statistical significance are those which have a lot of studies with multiple measurements on each participant or experimental unit. Such research presents the opportunity to p-hack your way to findings that likely have no scientific merit.

Retraction Watch: Can you elaborate on #4: “Proper inference requires full reporting and transparency”?

Ron Wasserstein: There is a lot to this, of course, but in short, from a statistical standpoint this means to keep track of and report all the decisions you made about your data, including the design and execution of the data collection and everything you did with that data during the data analysis process. Did you average across groups or combine groups in some way? Did you use the data to determine which variables to examine or control, or which data to include or exclude in the final analysis? How are missing observations handled? Did you add and drop variables until your regression models and coefficients passed a bright-line level of significance? Those decisions, and any other decisions you made about statistical analysis based on the data itself, need to be accounted for.

Retraction Watch: You note in a press release accompanying the ASA statement that you’re hoping research moves into a “post p<0.05” era – what do you mean by that? And if we don’t use p values, what do we use instead?

Ron Wasserstein: In the post p<0.05 era, scientific argumentation is not based on whether a p-value is small enough or not. Attention is paid to effect sizes and confidence intervals. Evidence is thought of as being continuous rather than some sort of dichotomy. (As a start to that thinking, if p-values are reported, we would see their numeric value rather than an inequality (p=.0168 rather than p<0.05)). All of the assumptions made that contribute information to inference should be examined, including the choices made regarding which data is analyzed and how. In the post p<0.05 era, sound statistical analysis will still be important, but no single numerical value, and certainly not the p-value, will substitute for thoughtful statistical and scientific reasoning.

Retraction Watch: Anything else you’d like to add?

Ron Wasserstein: If the statement succeeds in its purpose, we will know it because journals will stop using statistical significance to determine whether to accept an article. Instead, journals will be accepting papers based on clear and detailed description of the study design, execution, and analysis, having conclusions that are based on valid statistical interpretations and scientific arguments, and reported transparently and thoroughly enough to be rigorously scrutinized by others. I think this is what journal editors want to do, and some already do, but others are captivated by the seeming simplicity of statistical significance.

Pless note: I would be interested if any readers disagree. Please outline your views in 20 words or less. (Just kidding)

Dying en route to safety – the mortality rates of refugees to Europe

15 Sep, 15 | by Klara Johansson

Refugees are often barred from conventional modes of transport, and thus reduced to using unsafe means of travel. But people who are running away from horrible risks are willing to take quite extreme risks. Or as stated by the somalian-british poet Warsan Shire “you have to understand that no one puts their children in a boat unless the water is safer than the land” (from her poem Home, you can read it in fulltext here or hear the author read it herself here).

We’ve seen this over the last few years, when ever-increasing numbers of desperate people attempt to reach Europe, pushed by a number of converging factors (war in Syria, conflicts in Afghanistan and Nigeria, repressive regime in Eritrea – and overfull refugee camps, and instability in Libya, which has previously harboured many refugees). Europe is by no means the most common destination for refugees – millions are displaced within their own countries or harboured in neighbouring countries, often under very difficult conditions – but Europe is the most dangerous destination for clandestine migrants globally, according to the International Organization for Migration.

I’ve been looking for some comprehensive overview of mortality of the refugees entering Europe. There is a lot of data available online, but I couldn’t find any summary of mortality in relation to how many refugees are arriving. So I downloaded some of the available data and made some calculations and graphs, for my own understanding, and now sharing it with you. As always, please let me know if you find some factual errors or missing information (but complete zero-tolerance for haters and demagogues!)

The graph below shows the numbers of arriving migrants side to side with number of deaths (=dead and missing-at-sea), by year and split by which route they arrived. (See extra information about the data at the bottom of this post.) Deaths so far in 2015 are a little over 3,000, of which about 2,800 died on the Mediterranean and about 200 died on European ground. The IOM states that 95% of deaths on the Mediterranean occur along the Central Mediterranean route (going from North Africa to Italy), which we also see here (the red fields). Though the numbers of migrants are the highest in 2015, deaths are lower than in 2011, which is also a conclusion of the latest newsletter of the Migrant Files. This should mean that the overall mortality rate (per number of migrants) is going down. In the left graph, we also see that the safer, Eastern route has increased it’s share in 2015 (as far as I understand, partly from geopolitical reasons). So, have the mortality rates declined per route, or has the overall rate declined because the routes have shifted?

migrants and deaths, low qual smaller

I then computed mortality rates (graph below) based on the two different sources presented above. Combining different sources in this way is of course a risky business, in case they are based on different definitions or such. Or error sources could differ across time for the two sources. For instance, it’s possible that more migrants passed undetected in the earlier years, when Frontex had less resources – but of course, for the same reasons, more deaths could also have been undetected.

Bearing in mind that there are several possible sources of error for the graph below, I still think the graph shows a relevant story. Mortality is indeed down hugely compared to 2011, for all routes and especially for the Central Mediterranean route. Mortality on the Western Mediterranean route (from Morocco to Spain) has kept decreasing. But from 2012 onwards, mortality rates for the most dangerous route, the Central Mediterranean seem to remain roughly the same, despite the large rescue operations. This graph only goes up to July 2015, and the Migrant Files state that mortality rate during June-August has been the lowest since start of data collection, so it’s possible that the graph will change when all of 2015 is included.

The available data is a bit fuzzy still regarding the causes of death (many cases are unclear, so it’s hard to make an overview). For the deaths on the Mediterranean, drowning is one major cause of death of course, while others have suffocated below board or died from dehydration or exhaustion; also some deaths due to fall injuries after being pushed (accidentally or intentionally) and at least two cases of death during childbirth. For the deaths on land during 2015, suffocation seems to dominate (largely inside trucks during transport), followed by traffic related causes – including people hiding under trucks or similar to cross borders, and being crushed after losing their grip –  and exhaustion/dehydration and similar. For previous years, violence and suicide also play a significant role.

Data collection and research on vulnerable, hard-to-reach populations is extremely difficult. The data on deaths I used here have been painstakingly compiled from multiple sources by a group of obviously hardworking journalists; and the data on arrivals are based only on those who are registered. (See more details on data at the bottom of the post.) Both deaths and number of migrants are likely to be underestimated – and the incidence rate of non-fatal injuries remains unknown, along with other information that is vital both for humanitarian efforts and decision-making at the top political level. Maybe some organization could reach out to the refugees and crowdsource information about health, injuries and needs from those who know it best, using for instance a tool like Ushahidi? Refugees and aid workers along the routes have phones, all that would be needed is a central initiative to coordinate and validate the data. And the refugees crossing the mediterranean could maybe be tracked using cell phone data, like one research study did in Haiti, and which is now done at the Flowminder foundation.

For added understanding of the circumstances, turn to professor Hans Rosling:

…and for added understanding of the human side, I share a video from #helpiscoming. But you should have some tissue paper close at hand if you watch it.

 

About the sources:

Number of deaths are available from at least two sources, the Missing Migrants Project of the International Organization for Migration, and the Migrant Files (the latter is a project from a European consortium of journalists). The method of data compilation seems quite similar between the two sources (combining reports from rescuers, rescued, and media). In many cases of boats rescued at mid-sea, they only know the number of missing, and have no actual dead bodies, which mean that the numbers presented here represent “dead and missing”. The IOM numbers are marginally more conservative, but the difference is small. Since the IOM only has data for 2014 and 2015, I chose to use the data from the Migrant Files. The data is available as a spreadsheet from their site; I downloaded it, cleaned up the categorizations of routes, and summed it up by year, so you won’t find these exact numbers on their site.

I picked the data on arrivals from Frontex, the EU border authority. If you follow the link, data from 2015 are available in the map, and data and metadata for previous years are available per route if you click the arrows in the map. The arrivals along the Western Balkan route is a combination of people who already arrived via the Eastern Mediterranean route, and people arriving across land. So some of those who first came across the Eastern Mediterranean might be registered twice.

Data viz: adolescent injury and mental health

10 Jul, 15 | by Klara Johansson

I’m addicted to interactive visualisations of data, when they are well-made, informative and easy to use. One that I’ve returned to repeatedly is the “GBD 2010 Heat Map“, which ranks causes of deaths and DALY’s globally. The graph is based on the Global Burden of Diseases, Injuries and Risk factors Study, an impressive project that aspires to quantify mortality and morbidity globally. (Needless to say, the uncertainty intervals are wider for countries lacking comprehensive mortality registration… but it is especially for those settings that this project is invaluable!)
It’s a quite simple graph, but the beauty lies in how easy it is to shift between the measures, groups and countries/regions you are interested in. NB: The picture below is just a static image showing only the age groups I selected for this blog post, go to the live graph to explore other options than those shown here.

One thing that stands out very clearly in this graph is something we are already aware of: that injury prevention is an urgent issue among adolescents and young adults. Of the top ten causes of death worldwide in the ages 15-19 and ages 20-24, injuries rank as first (road injuries), second (self-inflicted), third (violence), fifth/ninth (drowning) and ninth/tenth (burns).

If we would change* the measure shown to YLD – years lived with disability – the main cause of morbidity for those aged 15-24 is depression; other mental health problems such as anxiety, conduct disorder and substance abuse are also among the top ten (see the graphs for ages 15-19 and ages 20-24 ).
These two issues – injuries and mental health – are not unrelated. Of course, mental health problems are strongly related to suicide and self-harm. But a recent article in Injury Prevention by McDonald, Sommers & Fargo also highlights the complex interrelations between mental health problems and risky driving, a complexity that seems particularly prominent among adolescents and young adults compared to adults. The article is based on a sample of youth and adults who are high-risk drivers, and shows, for the younger group, several significant pathways from depression and conduct behavior to various aspect of risky driving. (Similar results have been demonstrated earlier for example by our own Bridie Scott-Parker, a short report here in Injury prevention, and a path analysis in British Journal of Psychology.)
Thus, to some extent, mental health promotion and injury prevention need to go hand in hand, maybe especially among adolescents.

 

* In the live graph, select the measure you want by using controls in the top panel. Here you can also select age groups, male/female/total and countries/regions. Selecting regions can be a bit tricky, which I find to be the main drawback of this graph.

Latest from Injury Prevention

Latest from Injury Prevention