Gustav Nilsonne: Towards an ecosystem for open data

Have you ever tried to get your hands on data from somebody else’s scientific paper? I have, for a meta-analysis. In my experience it is a discouraging task. If data still exist at all, they are too often kept under lock and key. Savage and Vickers tried to retrieve 10 datasets from papers published in PLoS journals. They received only one, even though the PLoS journal policies at the time required that data be shared on request. Vines et al tried to retrieve datasets from 516 papers in ecology from two to 22 years old. They received only 101 (19%).

This shows that in some areas, it is too often the case that our current practices for data archiving and sharing are no better than if we were to purge our records by fire from time to time. Since excellent technical solutions are now available for publishing open data of most kinds, there is no excuse for allowing this attrition to continue.

So far, we academic scientists have largely failed to change expectations and practices for data sharing for the better. Our failure has become so glaring that our paymasters, the politicians, are instead beginning to force change upon us. The EU Commission has recommended that all EU countries institute policies for open access to research data and publications. I suspect we will see gradual implementation over the coming years in the different member countries.

Such a development would be in the right direction. I am generally in favour of open access mandates for research data. But I do not think they are the whole solution. Unless scientists feel a genuine motivation to publish their data, there is a risk that publication will be incomplete, tardy, and of poor quality.

What we need is a shift in our culture to make scientists feel that someone will want to use their data, and that they will be acknowledged for providing it. In order for that to happen, we need to build a working ecosystem around open data.

We need to integrate a meta-analytical mode of working into our everyday practice. If we estimate an effect in a new dataset, the best answer to our research question may often be obtained by joining our data with any earlier investigations of the same question—to estimate the overall effect.

Imagine a conference on your favourite topic, for example, low back pain or the science of consciousness. A speaker takes the stage, and explains what her latest experiment was about. She then shows a table with all the prior studies on the same question. Fourteen different papers, of which three are yours. Now, she proceeds to estimate the summary effect. Would your contributions be counted in this scenario? Are the data available, or have you consigned your results to oblivion?

If this sort of thing gets going, I think researchers will quickly see the value of making their data available. Another advantage is, of course, that data tend to find unforeseen uses. Even the barest datasets have value. For example, I have found use for raw outcome measures, with no additional information, to identify the expected distribution of that measure in the general population. In this case it was interleukin-6 in plasma, and I concluded—based on four published datasets—that the variable is exponentially distributed. That is of course much safer than to draw the same conclusion from the newly collected dataset on which I happen to be working.

Sharing research data is always valuable. We need to recognise and reward open practices. More importantly, I believe much could be gained if we were to make a habit of using extant data as a foundation to stand on when we add our own next building block to the edifice of knowledge.

Gustav Nilsonne is a researcher at Stockholm University and Karolinska Institutet, Stockholm, Sweden. You can follow him on Twitter @GustavNilsonne

Competing interests: I have read and understood BMJ policy on declaration of interests and declare the following interests: none.

Information for Authors