"The real impact of data always comes from the intersection of different data sets," Atkinson says. "You don't get into the earth shattering discoveries until you start to link disparate data sets."
To illustrate this idea, Atkinson points to oceans and their tides. To understand how tides work upon oceans, you need to understand the correlation between oceans and the moon.
"You can't look in isolation and find causes," he says.
Moreover, when you have data of appropriate width (i.e., enough disparate data sources), the volume of data doesn't necessarily have to be large to provide you with effective results. For example, through SumAll.org, a nonprofit dedicated to using data for social good, SumAll is helping New York City and nonprofit organization CAMBAcombat homelessness in a pilot program.
Eviction notices are among the primary signifiers that a family is about to become homeless -- though not all evictions lead to homelessness. About 200,000 households get evicted in New York City each year. In big data terms that's not an exceptionally large number of records. But identifying which of those 200,000 are most at risk of becoming homeless as a result of eviction proceedings is a challenge.
Before SumAll got involved, CAMBA, which focuses its efforts in Brooklyn, would manually go through the list of roughly 5,000 new eviction cases in Kings County Housing Court each month and then send letters about its services to those in the areas they serve -- about 400 a month. With SumAll's help and some targeting techniques borrowed from data-driven marketing, CAMBA was able to narrow the list considerably.
First, it geo-coded all the cases to determine which were in neighborhoods CAMBA served. Then it went "wide" with its data, pulling in data from different data sets that indicated a family was "at risk" -- past experience with the shelter system, past experience with the foster care system, education level, employment status and age. By correlating these disparate data sets, SumAll was able to help CAMBA identify the 30 to 50 most at risk cases. CAMBA, in turn, was then able to leverage its resources more efficiently to help those families.
The end result was that CAMBA was able to provide 50 percent more families in the pilot neighborhood with eviction prevention services.
"It really is the power of wide data, of seeing correlations in spots that have never been connected before," Atkinson says.
In fact, Atkinson says, focusing on big data rather than wide data can actually make it harder for you to leverage your data. The drive to collect all the data you generate can become a big inhibitor to using it.
"There's a lot of inherent problems in the endless collection of big data," he says. "People build their reservoirs so deep that they're incapable of asking questions about it. Most of our partners have tons of data, but they haven't leveraged it because it's become too big a problem."
CIOs can find themselves in a such a situation because they tend to be technology-oriented, Atkinson says. They want to build a data infrastructure that will allow the business to ask any question it can conceive and get an answer. But this "boil the ocean" approach is ponderous at best, and in the meantime executives like the CMO are going around the CIO to access the tools they need.
"It's the role of the CIO to not only be a technologist, but to be an active driver of using data to improve the business," Atkinson says. "How can we make the data live for our customers"
"CMOs are facing frustration in getting things done," he adds. "Every other tool now is a technology tool for marketers. The sales architecture of the tech space is now going after CMOs directly because the CIOs are missing it."
The answer, he says, is not to think of big data at all. Instead, think in terms of business problems. Start with the narrowest problem set you can think of and determine how you can leverage data to make things better. Then iterate.
"What is one simple question that we can be really response to," he asks. "Knowing when a customer is going to churn Build an infrastructure for that. Build a pipeline for the data and plan the data flow through your organization. Condition your organization to do that again and again."