London — Individual scientists cannot be trusted to preserve their research data reliably, claims a study, which found that around 80 per cent of scientific data is lost within two decades of their publication.
The paper published yesterday in Current Biology, warns that this loss of data can waste research money and delay scientific research.
And it calls for urgent policy change to enhance research data storage at the point of publication and for its sharing through public archives.
The inability to access old datasets is especially serious in richer countries, where the technology that scientists use becomes obsolete and replaced at a quicker pace than in the developing world, according to Timothy Vines, visiting scholar at the University of British Columbia, Canada and lead author of the study.
"People in America and Europe have used floppy discs for many years but they do not have access to these any more. In countries where resources are more limited, they may hold onto these technologies for longer and that could mean that they are more likely to be able to access to their data," he tells SciDev.Net.
Similarly, the e-mail addresses of many paper authors, especially those from the developing world, were no longer functional after a few years, hampering data sharing requests.
Vines and his team collected a randomised set of 516 papers published between 1991 and 2011. They found that the chances of obtaining the underlying data sets from authors decreased by 17 per cent each year from the second year after their publication, and the odds of finding a working e-mail address for the authors papers by seven per cent a year.
Although their study did not sort this loss of data by authors' country of origin, Vines says he thinks this is likely to be a worldwide issue and warns that it can cause a "significant loss of money" that would be better allocated to new research.
"Collecting that data the first time cost money. If you need those results now and they are not available, you will have to use research funds to collect them again. It is much more efficient to make sure you preserve the initial data set," he explains.
If data is not available anymore, scientists cannot use them to confirm findings or answer new questions.
That loss can be especially damaging in some scientific fields, such as ecology.
"Ecological data, which was collected in a specific time and place, are irreplaceable, as you can't go back to that time and place and collect them again. One it's gone, that data is lost forever," he adds.
Medicine is another field that may be particularly affected, Vines says, because researchers need access to original data sets to assess the efficacy of treatments and develop new ones.
He hopes scientists will make a bigger effort to better preserve data sets in the future.
"The results of research of many, many years have been lost to science and we need to stop that happening," he claims.
Scientific journals are "in a very good place" to help address the problem, he says, as they can identify the data that are associated with a particular paper and tell the authors that making data available is an unavoidable condition for publication.
Kevin Ashley, director of the Digital Curation Centre at the University of Edinburgh, United Kingdom, tells SciDev.Net: "This paper provides useful, quantified evidence to substantiate what many knew already - that data held only by its creators is inaccessible and fragile.
That's why funders in the United Kingdom and elsewhere have mandates on data archiving and why universities are investing millions in research data management services. The reuse value of this data is too great to do otherwise."
But Ashley adds that data are "valuable whether or not its creators managed to get a publication out of it" and that all research data early in its life cycle should be preserved "in a way that lets others discover it".
Current Biology doi: 10.1016/j.cub.2013.11.014 (2013)