Big Data has been all hyped up for years. But only now has Germany's Ethics Council decided it's time to work out a position. So what is the issue? And why do some researchers say they can't go on without it?
As buzzwords go, Big Data has been buzzing for a while. And yet we're still trying to work out what it is. Is it data that is simply big? Is it lots of data? Or is it lots of different types of data? And just why is it so important for the future of research?
"I agree: big data is a buzzword," says Dr. Bonnie Wolff-Boenisch, head of research affairs at Science Europe, "and more data doesn't necessarily mean better health. But we're getting a better understanding of the complexity of the human biological system; and you have to understand how the system works to understand how diseases develop, and that's now only possible with this massive amount of data which shows correlation and causation."
Okay… so we are talking about lots of data?
"Well, it's not just the sheer volume and velocity, it's also what you make of the data," Wolff-Boenisch says. "We call it big science. You draw data from databases, publications, social networks, microbiomes and epigenomes… and you put it together to understand the biological system and make predictions about diseases and try to keep people healthy."
Life-logging: an ethical minefield
Ah, social networks. Isn't that where we all freely post comments about our health in the hope that some researcher will pick up on it and magically cure our disease? No.
On social networks we - perhaps, erroneously - believe we're communicating with our friends. People we know. But we're not.
Wearables, like the Apple Watch, mean we're sharing more and more of our personal health data - sometimes unwittingly
Unless you go to the trouble of regularly maintaining your privacy settings, almost anyone can read your most intimate information - all that you freely reveal, and then some.
And it's for this reason that the German Ethics Council has decided to use its annual conference in Berlin (21.5.2015) to finally work out where it stands on the issue.
One of the areas it's focusing on is life-logging, because while the gadgets may be cool, there's a good chance we're becoming involuntary lab rats.
"There's a whole bunch of gadgets - wearables and armbands - that people use to track their daily activities, such as tracking the number of steps they take, blood sugar levels, heart rate, quality of sleep, or how their moods change throughout the day," says Dr. Nora Schultz, a research officer at the German Ethics Council. "They use them for their personal analysis, but many of these apps and devices can also pass the data onto third parties."
So we have to be clear about who - companies, organizations or government bodies - has access to which of our data and what they are allowed to do with it.
It's a question of ethics, but also one of legislation.
For its part, the European Union has yet to finalize its position, and it's struggling as much on this issue of data protection as it is on the issue of getting global, Internet-based firms to pay their taxes.
The re-identification of your data
Then there is the issue of re-identification - the concern that no matter how anonymized our health data may be, individuals can still be traced and re-identified when data from various sources are brought together and analyzed - the very aim of big data.
"You might be able to recognize people with the more data you collect," says Wolff-Boenisch. "One danger is if this data were to leak to health insurance [companies], and you could predict a predisposition for a certain disease… this is a risk, and that's why it has to be restricted. On the other hand, this is the price you have to pay for advancing research and helping prevent people from getting diseases."
And the more we share our health data, the more we challenge our traditional relationships with health practitioners
The extent of this risk may depend on the kind of data researchers collect.
Broadly speaking, there are two kinds: static and dynamic data.
Static data would be, for instance, data collected from a sample of 100 people, who go to hospital at regular times, and have their heart rates checked.
Dynamic data, on the other hand, is a constant stream of potentially ever-increasing and variable data, such as the kind we may provide via smart watches.
At present, dynamic data is harder to process than static data. The algorithms, or analytics, have to be as fast as the data to measure them. And we're not there yet.
Keeping it anonymous
In any case, in Germany, legislation prevents researchers from using highly personalized, dynamic data.
"There are studies that have shown you can re-identify individual households with quite simple data mining technology and find out who is living [where]," says Dr. Emmanuel Müller, a senior researcher on data mining at the Karlsruhe Institute of Technology.
And he agrees: It's a personal concern.
"That's why I think the legislation should stay as it is. You shouldn't provide all your information about your household or your health activity to one general instance."
That doesn't mean he wants to entirely stop the collection of big data.
"There can be anonymous datasets where you transfer only an aggregate, not all the details. So if you have information about the general blood pressure of the entire society, that should be sufficient for you to gain some knowledge about your population. You don't need to know my heart beat on a second or millisecond basis. That wouldn't help you on a government level for health regulation."
But technology firms such as Google and Apple may beg to differ. What if they diversified into health care, and we were encouraged to sync our doctor's records as well as our photos with the cloud? It wouldn't take much for Apple's smart watch heartbeat app go from "cute" to curious.