Tech enthusiasts claim Big Data could revolutionize governments and transform society. But many nations lack basic information about their populations. How can sustainable development occur when core data are so patchy?
Pick a country — one far away whose name you can't spell — and pretend you're the president. Citizens in a poor, provincial region want better healthcare. Your officials have budgeted $12 million (€9.7 million) in foreign aid to build some hospitals. They have some idea which diseases stalk the area. But not how many people live there. Your sole wish is that all your citizens live long and prosperous lives. So how many hospitals should you build — and where?
Many governments confront such questions on a daily basis. But without accurate population data, few come up with convincing answers. Almost half the world's nations do not fully register births and deaths. They know little about their citizens — how many live in the country, where they were born, how old they are. Without a comprehensive death log, they do not know who dies of what.
In these regions, basic governance becomes a question of guesswork. Where do you build roads if you don't know which villages need connecting? How many teachers do you employ at a new school if you don't know the number of children within walking distance?
At the far end of the spectrum, administrative shortcomings become a question of life and death. If next year's harvest should fail, or a drought should occur, which villages will be hit hardest? How much extra grain will avert a famine? Should you start stockpiling today?
About half the citizens of Nairobi live in informal settlements. The Kenyan capital has doubled in population since 2000
Making it count
Weak statistical systems are holding back human progress, the Organization for Economic Cooperation and Development (OECD) warned in a report published in October last year.
The club of mostly rich nations fears that a lack of basic data is hindering progress towards the Sustainable Development Goals (SDGs), the United Nations' 17-point action plan for global prosperity, because policy makers cannot allocate resources efficiently.
Data is missing or incomplete for most of the 232 indicators — 12 of which cover more than one goal — that measure progress towards the SDGs. These include metrics ranging from forest cover, which is easy to measure, to food waste, which is not. Many indicators lack even an agreed methodology.
Collecting reliable statistics is not the goal itself, but rather "a means to an end," said OECD policy analyst Ida McDonnell. "If you're really serious about lifting the level of economic and social welfare of your citizens — targeting your financial resources, budgeting and attracting international trade and investment, borrowing money on the international markets or going to the International Monetary Fund or the World Bank — you need to have basic statistics."
Health metrics such as mortality rates tend to have good data available. Challenges such as climate change do not.
Gaps in the data
Painting a nation's portrait in numbers is easier said than done, said Edith Rogenhofer, a mapping specialist at humanitarian relief charity Medecins Sans Frontieres (MSF). "In a lot of countries the census data are just not very good. It's a huge effort of human and financial resources."
Countries most in need of vital statistics are often those in which the data are hardest to collect. Only 15 percent of sub-Saharan African nations have fully registered birth and death indices, according to the OECD report. That rises to about a third of countries in southern Asia. Many lack the infrastructure and expertise to ensure accuracy.
"You can't just go count people in five villages if it takes half a day to reach the next village," said Rogenhofer. "You need the resources, you need people to go there, and you need somebody to combine all the data. It's not easy."
In politically unstable countries, the statistician's job becomes even harder. Corruption can lead governments to manipulate data in their favor. Wars and extreme weather events, meanwhile, force people to flee their homes, shortening the shelf life of counts. "Even if you do manage a census," Rogenhofer said, "those people might not be there later."
Progress towards the 17 SDGs is measured by 244 progress indicators, 12 of which are the same but cover multiple goals
A 2016 report by NGO Open Data Watch estimated that $3 billion of overseas development aid must be spent on statistical systems each year — five times more than current levels — in order to meet SDG data demands in developing countries.
"The issue of the SDGs is critical," said Zachary Mwangi, director general of Kenya's national statistics bureau. "The indicators require we leave no one behind. That requires greater disaggregation of data."
It is often impossible to break national figures down to a local level — even for the SDG indicators which do have data. In these cases country-wide averages can obscure regional inequalities.
"In Kenya, areas in and around the capital Nairobi have relatively low poverty," said Homi Kharas, economic adviser at NGO World Data Lab. "But in the North-East and parts of Western Kenya, the absolute number of extreme poverty remains very high. Progress is too slow to end poverty by 2030."
For Kenyan statisticians, this means going from house to house, village to village, and knocking on doors to ask questions. "We use local people on the ground, we train them," said Mwangi. But tight budgets mean such an expensive operation is not an option for many governments. "We need to be more creative in finding new data sources."
Bangladesh houses the largest refugee camp in the world. Its half a million inhabitants are at risk of disease and flooding
Edith Rogenhofer, the MSF worker, counts houses in Bangladeshi refugee camps — from her office in Vienna 7,000 kilometers (4,350 miles) away. She uses satellite images of shelters to estimate how many people live in the camps. The charity then uses the data to organize and direct humanitarian relief, such as water and medicine, where it is most needed.
"The important thing to keep in mind is refugee camps are not static," said Rogenhofer. "Every day people are arriving, people are leaving and people are moving within camps. The population is constantly changing. In October we counted 76,931 houses. In December there were 109,820."
Tracking flows of people from within the camps is arduous and often inaccurate. "Of course it's something you see on the ground," she said. "But there it's just too much, it's overwhelming."
In recent years, MSF has combined satellite data with on-the-ground ethnologists to map difficult regions. It is one of many alternative data sources deployed by aid workers to target resources in the absence of official statistics.
The potential of Big Data in particular has captivated the imaginations of the development community. A UN pilot project in Uganda, for instance, collects public concerns about development issues by analyzing radio phone-in shows. In Bangladesh, a study has combined cellphone data with agricultural information to predict regional poverty.
In Namibia, the health ministry has used similar datasets to fight malaria. In 2013, officials tracked population movements with cellphones and mapped mosquito breeding grounds with satellite images. They disrupted the infection cycle by distributing nets to the 80,000 citizens most at risk.
"What Big Data can do is provide a very timely insight," said McDonnell. "But they still have to be checked against a formal count somewhere."