Big data's great promise for development will need both human and technical capacity building, reports Jan Piotrowski.
Every two days, more data is created than in the whole of human history up until 2003 - enough information to fill a stack of DVDs that would reach to the moon and back.
But a by-product of this data mountain - and testimony to our increasingly digitised lifestyles - is evidence of people's habits and preferences. Evidence that can be tapped into to reveal patterns and provide new insights for development practice.
There has been much talk about using sources such as social media and public news, data from private companies, satellites and road-side sensors, and 'crowdsourced' reports to boost development.
The UN has even called for a 'data revolution' to underpin new development goals - so that sustainable development practitioners can better track advances, integrate evidence into decision-making and provide more transparency.
But drawing out the trends hidden within data takes skill - and the jury is out as to whether developing nations and development organisations have the capacity to interpret the so-called 'big data' by themselves.
"At the moment, the explosion of big data has far-outpaced our ability to make sense of it in all countries, but most of all in poorer nations that already lack human and technical capacity," says Claire Melamed, head of the growth, poverty and inequality programme at the Overseas Development Institute, United Kingdom.
"We are all running to catch up with the technology."
The question is how best to make up the ground. What are the capacity gaps that need filling, what are potential uses and limitations, and who are the key players that need support to make it happen?
Complementary, but problematic
A recent UN Economic and Social Council report looking at 107 national statistical offices showed that they see big data projects as a complement to, not a replacement for, traditional collection methods such as surveys.  This echoes the views of leading big data organisations, such as Global Pulse, a UN initiative researching ways to use big data for development.
The report also showed that more than half of the world's states have plans to explore new uses for administrative data, such as for tax, customs and social security records. Social media, internet searches and GPS (Global Positioning System) tracking are also high on the agenda.
The Philippines, for example, has plans to use transactional data to estimate the economic contribution of penetration rates of information and communication technologies (ICTs). And social media is already used by the Ghanaian and Mexican governments to track public perception and credibility of their administrations.
But the UN survey also found that governments have a number of problems with implementing big data projects. Concerns include legal questions about privacy and access to data, lack of human capacity, and scaling IT infrastructure to cope with the demands of large data sets.
For example, the statistical office in Kenya - one of Africa's leaders in information technology - lacks the expertise to train staff to use big data, and also awareness of the technology needed for analysis, UNESCO's report says.
Lack of money is another concern, particularly in developing countries, as the proportion of overseas aid dedicated to statistical programmes was slashed in half between 2011 and 2012, to 0.16 per cent, according to a 2013 report from the Partnership in Statistics for Development in the 21st Century (PARIS21). 
Jon Gosier, chief executive of D8A Group, a company that offers tools for city planning based on tracking people's movements, confirms that the ballooning data landscape has left many governments out of their depth.
"The real problem is that governments are swimming in more data than they have ever had but they lack the capacity in their staff to do anything with it," he tells SciDev.Net.
Although this skills gap affects all countries, it is much harder in the developing world to find people with the statistical, programming and design capabilities necessary to make sense of the information.
Engage the private sector
Gosier tells SciDev.Net that if governments do wish to see big data used for development, their best bet could be to allow the private sector to do the groundwork. They should recognise, he says, that they will struggle to compete with companies set up to do the job.
Kenya, for example, through government grants and incubation programmes, has built a critical mass of information technology activity in the private sector, according to Gosier, and the country is a hub for developing big data systems.
Nearly half of businesses in the country that have implemented big data initiatives rely on in-house rather than imported or migrant talent, according to an IDG Connect survey of companies in Kenya and Nigeria.  Yet, capacity is far from sufficient, as only 23 per cent of these firms have staff trained to deal with the demands of big data, it finds.
John Quinn, a data scientist at Global Pulse, says this skills gap in Africa extends to the continent's academic circles.
Academic interest in big data is still relatively low, but since taking a lectureship at Uganda's Makerere University in 2007, he has seen the data analysis and computing skills of students and staff improve rapidly.
"For the moment there is a need to import skills but that is changing quickly," he tells SciDev.Net.
The digital divide
It is not just a lack of human capacity that is hampering big data. In Africa, as with other developing regions, there are patchy internet connections, intermittent power supplies and poor reach of high capacity cables, says Quinn. This means many people are simply not represented in digitally collected data.
The latest data from Internet World Stats, an online market research company, find that less that 16 per cent of Africans have access to the internet.
Furthermore, according to the World Bank's Enterprise Survey, Sub-Saharan African countries suffer power cuts on average every four days, each lasting around five hours. This is 25 per cent more frequent and almost twice as long, as the overall average of the 135 predominately developing nations investigated. 
The relatively low volumes of big data generated in African countries, compared with developed countries, means overseas data centres can currently fulfil the continent's storage needs. But as the volume rises, Quinn thinks countries may also need to consider building their own storage sites.
Vanessa Frias-Martinez, at the University of Maryland's College of Information Studies, United States, sees the low penetration rate of the internet as a limitation to big data strategies in the developing world. It is not just a dearth of data, such as from social media, that is the problem, but the fact that it is unrepresentative.
Barring exceptions such as Indonesia, where social media use is high, the few people that use the internet regularly in developing nations are predominantly young and wealthy, she says. So any attempts to draw broad conclusions on the status of the 'internet population' will be skewed.
The dangers of drawing conclusions from unrepresentative data are illustrated by an attempt to apply Google Flu Trends - a mapping exercise in the United States based on flu-related internet searches - to Bolivia, she says.
Attempts to determine the prevalence of flu from digital activity in Bolivia failed because its population relies on doctors or traditional healers to diagnose flu, in contrast to a high number of Americans going online.
This could be an example of the "big data hubris" that David Lazer of Northeastern University, United States, and colleagues noted in a recent critique of Google Flu Tracker in Science. 
This is the "assumption that big data is a substitute for, rather than a supplement to, traditional data collection and analysis", they write. They found that Google Flu Tracker also overestimated flu cases in the United States.
"There's a huge amount of potential there, but there's also a lot of potential to make mistakes," Lazer told the Washington Post.
Easier data to tap?
But the digital divide challenges may be circumvent by other sources of data. For instance, call data records (CDRs) - which give the location, target number, duration and spending information generated by mobile phone calls - are much more available in some countries, says Frias-Martinez.
Giving researchers greater access to CDRs from a wide variety of locations and companies, potentially though a digital repository, could help generate big data for development research, she says.
Population movement patterns that CDRs can illuminate have already helped track malaria outbreaks in Kenya and emergency migration following the 2010 Haitian earthquake, and attempts are underway to use them for infrastructure planning in Cote D'Ivoire. 
"Until now we have been working as researchers in our own silos with our own small data sets," she says. "If we really want to move this forward there has to be infrastructure that we can access in real time so that we can all collaborate as a community."
But Frias-Martinez, who previously conducted data mining for a telecoms multinational, doubts that companies will relax their concerns about privacy and commercial competition.
"What is preventing us [establishing a repository] is convincing telecoms companies to share their data," she says. "They see CDR data as giving them a competitive advantage, so it will be tough."
I-Sah Hsieh, global manager for international development at SAS, a company specialising in big data analytics, is more positive. He says there are examples from the insurance sector and the pharmaceutical industry - a 2013 agreement between US and European pharmaceutical trade groups has significantly freed up researchers' access to clinical trials data - where commercial information has been shared.
And Orange's experiment in Côte d'Ivoire showed how useful simple mobile-phone data can be for various development initiatives.
Even without industry collaboration, there can be "incredible insights" just by making better use of the vast stores of information already possessed by governments across unconnected databases, he says.
"The first step to high-value big data is to synchronise and link these databases and then think about combining this with the mountains of social media and web data," he tells SciDev.Net.
A lack of trained people to deal competently with big datasets is certainly an issue, he says, but technology that correlates links between reams of data will bring "analytics to the masses", he says. Helping laypeople to analyse data could free up specialists for more demanding tasks, he adds.
The barriers facing big data becoming a useful tool for development are numerous and great. But they are not insurmountable. With the development community rallying around the UN's data revolution call, there is reason to believe that big data can fulfil its promise in the years to come.
This article is part of the Spotlight on Data for development.
 Big data and modernization of statistical systems, ECOSOC (2013)
 Partner Report on Support to Statistics, Paris21 (2013)
 Big Data Trends in Africa, IDG Connect (2013)
 Enterprise Surveys (http://www.enterprisesurveys.org), The World Bank.
 Science doi: 10.1126/science.1248506 (2014)