Monday, October 6, 2014

Big Data and its rationale - why are we still at it?

We're in the era of Big Data projects.  This is the result of a fervor of belief that more data, essentially comprehensive and completely enumerative, will lead to deeper or even complete understanding.  It is an extension of the idea of reductionism and induction that was a major part of the Age of Science, ushered in about 400 years ago with the likes of Bacon, Newton, Galileo and other iconic figures. Examples in physics include huge collider studies and very costly space activities, and of course the Big Data drive is hugely prevalent in genomics and other biological and biomedical sciences.  But in our age, several centuries later, why are we still at it in this way?

The story isn't completely simple.  The Age of Science also led to the so-called 'scientific method', a systematic way to increase knowledge through highly focused hypothesis-testing.  Many philosophers of science have argued that, or tried to show how, the approach enabled our understanding of truth refined in this self-disciplined way, even if ultimate truth may always elude our meagre brainpower.  But why then a return to raw induction?

One reason, and we think a predominant reason, is the pragmatic competition for research resources. As technological abilities rise (pushed by corporate interests for their own reasons), we have become able to collect ever more detailed data.  The Age of Science was itself ushered in by technology in many ways, the iconic examples being optics (telescopes and microscopes).  But sociopolitical reasons also exist.  Long, large projects lock up large amounts of funds for years or even decades, guaranteeing jobs and status for  people who thus can avoid the draining, relentless pursuit of multiple, small 3- or even 5-year grants.  As the science establishment has grown, driven by universities for good as well as greedy reasons, funding inevitably became more competitive.

Careerism and enumerative ways to judge careers by administrators (paper and grant counting) are driving this system, but the funding agencies, too, have become populated with officials whose careers involve holding and building portfolios, using public relations to tout their achievements, and so on.  And once you've got to the top of the Big Data pile, it's a high that's hard to come down from!

"It takes Big Data to make it Big--But I did it!    (Drawn by the author)
The history of a worldview
From the Manhattan Project in WWII, and several open-ended research efforts that followed, the idea became obvious that if you can state some generic problem and get funding to study it, and justify why it requires large scale, expensive technology, and long term, well, you snared yourself a career's worth of secure funding and all the status and perks--and, of course, actual research--that go with it.  It's only human to understand those reasons.

However, there are also some good, scientifically legitimate reasons for the growth of Big Data. When I have queried investigators in the past--and here, this means over about 20 years--about why they were advocating genome mapping approaches to diseases of concern to them, they often said, rather desperately, that they were taking a 'hypothesis-free' approach because nothing else worked!   If biology is genetics, genes must be involved, and mapping may show us what genes or systems are involved.   For example, psychiatrists said that they simply had no idea what was going on in the brain to cause traits like schizophrenia, no candidate genes or physiological processes, so they were taking a mapping approach to get at least some clues they could then follow with focused research.

But to a great extent, what started out as a plausible justification for arch induction approaches, has become an excuse and a habit, a convenience or strategy rather than a legitimate rationale.  The reason is not because their reasoning in the past was wrong at the time.  The reason is because the Big Data approach has in a sense worked successfully: it has by and large proven not to provide the kind of results that initially justified it.  Instead of identifying causes that couldn't have been expected, exhaustive Big-Data studies, in genomics and other areas of epidemiology and biomedical science, have identified countless minor or even trivial 'causes' of traits (and the psychiatric traits are good examples), showing they are not well explained by enumerative approaches, genetic or otherwise. For example, if hundreds of genes, each varying in and among populations, contribute to a trait, every occurrence of a disease, or everyone's blood pressure etc. is due to a different combination of causes.  Big Data epidemiology has found the same for environmental and life-style factors.

What we should now do is to realize this successful set of findings from mapping studies.  Rather than flood the media with hyperbole about the supposed successes of current approaches, we should adapt our approach to what we've learned, and take a time-out somehow, to reflect on what other conceptual approaches might actually work, given what we now know rather clearly.  We may even have to substantially reform the types of questions we ask.

The reason for that sort of new approach is that once we, or as we, plunge into ever more too-big-to-terminate studies, with their likely minimal cost-benefit payoff, we lock up resources that clever thinkers might find better ways to use.  And unless we do something of that sort, the message to scientists is to be more strongly driven to think in Big Data terms---because they'll know that's where the money is and how to keep their hold on it.  This is exactly what's happening today.

Unfortunately, even many fundamental things in physics, the archetype of rigorous science, are being questioned by recent findings.  Life sciences are in many relevant ways a century behind even that level.  But this seems not to give us, or our funders, pause.  Changing gears seems to go against the grain of how our industrialized society works.

In times of somewhat crimped resources from traditional funders, it's no wonder that universities and investigators are frantically turning to any possible source they can find.  As we know from our own experience and that of colleagues, so much time is spent hustling and so relatively little in doing actual research, that the latter is becoming a side-light of the job.  But it doesn't really seem to be changing how people think, and the push for Big Data is an understandable part of the strategy.  The way to think about science itself is not changing under this pressure.  At least not yet.

We keep harping on this message because it involves both the nature of knowledge and the societal aspects of how we acquire it.  Even if there were no material interests in terms of allocation of resources, to cling to Big Data, we face a scientific or even epistemological problem that few seem interested in facing.  There is simply too little pressure to force people to think differently.

Perhaps, if the message is said enough times, and read by enough people, sooner or later, somewhere, someone might get it, and show the rest of us a better way to fathom the causal complexities of the world.

4 comments:

Anonymous said...

Thank you for keeping up the harping, and using a variety of different angles in doing so. I read almost everything on the blog, and try to think the content through. Gradually, I feel it sinking in, which has made me ask hard, deep questions about my research program in genomics. I attack different types of problems now, with the explicit aim to find causality, but have little enlightenment to offer others so far.

Ken Weiss said...

Reply to Anonymous:
Thanks for your comment. Don't give up! Keep knocking your head against the proverbial wall, with the issues in your battered mind. That is what it will take for someone, maybe you, to think of something more innovative.

I like to read about the tumult of the physics world in Einsteins early career, when just such issues were astir. Even in Darwin's time, discoveries in geology and biogeography were raising questions that traditional Linnaean and biblical explanations couldn't really handle.

Anonymous said...

Well, Ken, are you going to read the book and tell us if there any Einstein like possibilities for understanding contained therein?

Ken Weiss said...

Nope. I have,like others, to make my judgments. I scan of the various parts showed that I am unable to penetrate its wisdom. I would vote with the vast majority of scientists on this one. As to Galileo's, I have read that several times!