Sizing up big data

Sizing up big data

Sizing up big data

Information explosion and the power to harness it are revolutionising fields as varied as medicine and manufacturing, says Steve Lohr.

In his young career, Jeffrey Hammerbacher has been a scout on the frontiers of the data economy. In 2005, Hammerbacher, then a freshly minted Harvard graduate, did what many math and computing whizzes did. He went to Wall Street as a “quant,” building math models for complex financial products.

Looking for a better use for his skills, Hammerbacher departed to Silicon Valley less than a year later and joined Facebook. He started a team that began to mine the vast amounts of social network data Facebook was collecting for insights on how to tweak the service and target ads. He called himself and his co-workers “data scientists,” a term that has since become the hottest of job categories.

Facebook was a fabulous petri dish for data science. Yet after 2 1/2 years, Hammerbacher decided it was time to move on, beyond social networks and Internet advertising. He became a founder of Cloudera, a startup that makes software tools for data scientists.

Then, starting last summer, Hammerbacher, who is now 30, embarked on a very different professional path. He joined the Mount Sinai School of Medicine in New York as an assistant professor, exploring genetic and other medical data in search of breakthroughs in disease modelling and treatment. The goal, Hammerbacher said, is “to turn medicine into the land of the quants.”

The story is the same in one field after another, in science, politics, crime prevention, public health, sports and industries as varied as energy and advertising. All are being transformed by data-driven discovery and decision-making. The pioneering consumer Internet companies, like Google, Facebook and Amazon, were just the start, experts say. Today, data tools and techniques are used for tasks as varied as predicting neighbourhood blocks where crimes are most likely to occur and injecting intelligence into hulking industrial machines, like electrical power generators.

Big Data is the shorthand label for the phenomenon, which embraces technology, decision-making and public policy. Supplying the technology is a fast-growing market, increasing at more than 30 percent a year and likely to reach $24 billion by 2016, according to a forecast by IDC, a research firm. All the major technology companies, and a host of startups, are aggressively pursuing the business.

Demand is brisk for people with data skills. The McKinsey Global Institute, the research arm of the consulting firm, projects that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired, by 2020.

Yet the surveillance potential of Big Data, with every click stream, physical movement and commercial transaction monitored and analysed, would strain the imagination of George Orwell. So what will be society’s ground rules for the collection and use of data? How do we weigh the trade-offs involving privacy, commerce and security? Those issues are just beginning to be addressed.

Big Data is a vague term, used loosely, if often, these days. But put simply, the catchall phrase means three things. First, it is a bundle of technologies. Second, it is a potential revolution in measurement. And third, it is a point of view, or philosophy, about how decisions will be - and perhaps should be - made in the future.

The bundle of technologies is partly all the old and new sources of data - Web pages, browsing habits, sensor signals, social media, GPS location data from smartphones, genomic information and surveillance videos. The data surge just keeps rising, doubling in volume every two years.

Yet the importance of the sheer volume of data - and its exponential growth path - can be overstated. There’s a lot of water in the ocean, too, but you can’t drink it. Beyond advances in computer processing and storage, the other essential technology is the clever software to make sense of all that data. These are largely tools taken from the steadily evolving world of artificial intelligence, like machine learning.

The increasing volume and variety of data, combined with smart software, may well open the door to what some people call a revolution in measurement. This technology, they say, is the digital equivalent of the telescope or the microscope. Both of those made it possible to see and measure things as never before - with the telescope, it was the heavens and new galaxies; with the microscope, it was the mysteries of life down to the cellular level.

Data-driven insights, experts say, will fuel a shift in the centre of gravity in decision-making. Decisions of all kinds, they say, will increasingly be made on the basis of data and analysis rather than experience and intuition - more science and less gut feel. Data, for example, is an antidote to the human tendency to rely too much on a single piece of information or what is familiar - what psychologists call “anchoring bias.” And, again, the surveillance potential of Big Data technology, if it runs amok, is scary.
One glimpse of the potential payoff can be seen at the Mount Sinai Medical Center, in the work being pursued by the group Hammerbacher has joined.

The 100-member team at the Icahn Institute for Genomics and Multiscale Biology is headed by Eric E Schadt, a leading researcher in genomics and biomathematics.
The genomics revolution is on the cusp of realising its promise, according to Schadt, thanks to the advancing technology of genetic sequencing and analysis. The government-financed Human Genome Project, completed in 2003, cost $2.7 billion.

Today, whole human genome sequencing, identifying all 3 billion chemical units in the human genetic instruction set, can be done for $3,000. In three years, Schadt predicts, the cost will be less than $1,000, and in 5 to 10 years, less than $100, almost like a blood test today. The technology makes it possible not only to observe life at the molecular level as never before, but also to explore how the minute ingredients of biology and the environment influence each other in individual humans - and personalise treatment.

Schadt recruited Hammerbacher, an overture that coincided with Hammerbacher’s research into where next to best apply his skills. He describes his career as a matter of “following the smartest people to find the best problem.” Health care, in his view, is “the best problem by far,” where his talents could do the most good.  Hammerbacher remains the chief scientist of Cloudera and splits his time between San Francisco and Manhattan.

Get a round-up of the day's top stories in your inbox

Check out all newsletters

Get a round-up of the day's top stories in your inbox