Think Big Data

What you need to understand about big data, NYU Abu Dhabi Professor of Economics Christian Haefke explains, is that it’s actually small data – lots and lots of it.

Around the world, advances in the acquisition, organization, manipulation, and analysis of large quantities of information have created “a complete revolution” in recent decades, Haefke says. From physics to literature, research methods have been transformed. (And professors are not alone in applying new tools to old problems. Governments, sports teams, social media companies, and others are equally enthusiastic. The consequences can be dazzling — and sometimes chilling.)

“We’ve moved to ‘micro-data’ on each individual,” Haefke says. Scrubbed of personal identification and organized carefully, such data can be a gold mine when it covers enough individuals. He offers an example from his home country and his own field, labor economics: “I’ve worked with Austrian social security, covering almost everybody in the country. We know their entire work history, how much education they’ve done … who they’ve been working with.”

What is big data exactly?

Big data consists of extremely large data sets that are analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interaction.

This opens the door to, among other things, the great potential of network analysis. Haefke is developing a study on early-retirement decisions: how much your choice depends on how others in your network or company have decided the question. “Because early retirement is pervasive feature in Western countries … it’s important to understand what’s going on.”

This issue would also be of interest in the Gulf, Haefke adds, “because here there’s very early retirement. It would be interesting to understand what drives that. There’s clearly a wealth component.”

In almost any field, big data can greatly expand understanding of phenomena. 

“My area is macroeconomics,” Haefke says. “The way I was trained, we looked at aggregates, such as gross domestic product and inflation rates. But now, rather than looking at average wages, I have data on every single worker, so I can understand much better what’s happening with the birth and death of industries, with labor market policies” and more. Questions of economic inequality, for example, “become much easier to study because we can directly observe … (and) identify trends much earlier. 

Real understanding often lurks in the details of a situation. “After all,” Haefke says, “if my hand is in the freezer and my foot is in the oven, on average I’m comfortable.”

He offers an example: US corporate recruiters currently need on average 25 days, a record high, to find a suitable new hire. To understand why, Haefke examines the composition of the jobless pool, an insight that big data helps provide.

In a declining industry, even the best workers may find themselves jobless and eventually become discouraged. “It’s like inviting a girl to the dance,” he says, “if you get rejected too often, at some point you give up.” But others may have been fired for cause, while still others may be temporary victims of a fluctuating business cycle.

The big scepticism that social scientists often get is that our experiments are quite different from Galileo dropping a stone. Our ‘objects of study’ make decisions of their own.

Understanding who’s who among the jobless could lead to better-designed government and corporate policies. Those “discouraged workers can be a very attractive part of the labor pool if the industry comes back.” Immigrants are another part of the unemployment pool. “An architect, or anyone with a long training trajectory, does not care too much about competition from low-skilled immigrants.”

In social science, as in any science, data is of little use without theory. 

“The first step is to ask how can we think of (a question) theoretically. The next step is to ask if our theory is consistent with the data. In the past we had only averages, but now we can observe the whole distribution, and ask if our theories really make sense.”

When it comes to testing theories, Haefke notes, social science is a little different: “The big scepticism that social scientists often get from a science audience is that our experiments are quite different from Galileo dropping a stone. Our ‘objects of study’ make decisions of their own. A stone doesn’t mind if I drop it 100 times to figure out the gravitational constant. But villages would be pretty upset if I set income taxation at 20 per cent in this village and 80 per cent in that one.

“Rather than experimenting in reality, we try to write computer models and then experiment on the model, and … try to match it to the data in as many dimensions as possible. If I have detailed information on the heterogeneity of individuals, I can try to capture that in my model. When I use that model to experiment, I will reach more reasonable conclusions. They will still be quite wrong” he says with a grin, “but they’ll be better than before.”