Prayag Verma | Sunday, November 10, 2013 | No comments

Regardless of what you might think of the ubiquity of the "Big Data" meme, it's clear that the growing size of datasets is changing the way we approach the world around us. This is true in fields from industry to government to media to academia and virtually everywhere in between. Our increasing abilities to gather, process, visualize, and learn from large datasets is helping to push the boundaries of our knowledge.

But where scientific research is concerned, this recently accelerated shift to data-centric science has a dark side, which boils down to this: the skills required to be a successful scientific researcher are increasingly indistinguishable from the skills required to be successful in industry. While academia, with typical inertia, gradually shifts to accommodate this, the rest of the world has already begun to embrace and reward these skills to a much greater degree. The unfortunate result is that some of the most promising upcoming researchers are finding no place for themselves in the academic community, while the for-profit world of industry stands by with deep pockets and open arms.

The Unreasonable Effectiveness of Data

In 1960, the physicist Eugene Wigner published his famous essay, The Unreasonable Effectiveness of Mathematics in the Natural Sciences. It expounds on the surprising extent to which abstract mathematical concepts seem to hold validity in contexts far beyond those in which they were developed. After all, who would have guessed that Riemann's 19th century studies in non-Euclidean geometry would form the basis of Einstein's rethinking of gravitation, or that a codification of the rotation groups of abstract solids might eventually lead physicists to successfully predict the existence of the Higgs Boson?

Echoing this, in 2009 Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira penned an article under the title The Unreasonable Effectiveness of Data. In it, they describe the surprising insight that given enough data, often the choice of mathematical model stops being as important — that particularly for their task of automated language translation, "simple models and a lot of data trump more elaborate models based on less data."

If we make the leap and assume that this insight can be at least partially extended to fields beyond natural language processing, what we can expect is a situation in which domain knowledge is increasingly trumped by "mere" data-mining skills. I would argue that this prediction has already begun to pan-out: in a wide array of academic fields, the ability to effectively process data is superseding other more classical modes of research.

Email Newsletter

Like what you read here in this blog post?
Get more like it delivered to your inbox daily.

No comments:

Post a Comment