Kate Crawford, a principal researcher at Microsoft Research and a visiting professor at the MIT Center for Civic Media, has written a provocative post on the HBR Blog titled, “The Hidden Biases in Big Data.” She quotes former Wired Editor-In-Chief, Chris Anderson, as saying, “with enough data, the numbers speak for themselves.” Crawford then asks, can numbers actually speak for themselves?
Crawford’s answer is a simple no. She states:
Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.
I agree. Data – big or small – can no more speak for itself than a goldfish. Big data just makes a long standing problem… bigger. Data must be cleaned and ordered before it can be used, and what numbers mean depends on how we interpret them. I also agree that what we really need is not big data but, to use Crawford’s term, data with depth. This is what I was trying to get at in my post about big data needing a little help.
Chatting to my colleague Bill Pink, Senior Partner, Creative Analytics at Millward Brown North America, he suggests that making use of big data, or any data for that matter, comes back to first principles:
What question are we trying to answer? Do we understand the people, psychology, human relationships, the category or phenomena under study? The upside of the big data is we now have previously untapped assets to help us answer these questions – mobile collection of texts, social media, set top data on TV viewing… that’s the amazing thing.
And those new data assets can be used to provide a better explanation than if we did not have those data sets to include in the story. But that assumes a framework, analytic approach and tools to evaluate and integrate the data and reach these conclusions. It’s not the presence of the data that matters, it’s the question to be answered and the ability of the new data to take us to further than we were before.