Big Data’s Big Pitfalls

There are a lot businesses getting into big data this year and even more in the planning stages for next year. Yours may be one of them.  I’ve seen firsthand the positive impact big data initiatives make on businesses.  The cost savings, revenue streams and competitive advantages are well evangelized.  The pitfalls are not.  Here are a few ways I’ve seen data initiatives go wrong.  Please add to this post by sharing your stories in the comments.

Small Data Packaged As Big Data

This is a common pitfall of any change; taking the old way of thinking and applying new tools. Big data is a new product rather than an incremental improvement on small data.  Where small datasets were able to tell a business that 80% of all customers… or 40% of all employees…, big data has the ability to be much more specific and granular.  It reveals insights like customer Ryan A. has a 52% likelihood of making a second purchase in the next 3 months and a 91% likelihood if we send him a special offer of free shipping.  To realize the potential of big data, the business needs to raise its expectations.

As the last example reveals, big data is also prescriptive. It shows a clear course of action in many cases where as small data typically requires significant interpretation to determine a plan of action.  Small data packaged as big data often leads to paralysis by analysis and conflicting conclusions.  Incomplete analysis shouldn’t be tolerated.

Big data insights reach conclusions about causality while small data focuses on correlation. When the two get confused in a presentation it leads to poor decision making.  Google used a corollary model for flu predictions.  It worked in the short term but failed publicly and catastrophically in the long term.  Fortunately no one was taking any actions based on the model but businesses often use corollary models to inform business strategy decisions with erratic results.

When I see data point correlation I use this example to show why they are logic traps. Over the last 200 years as the numbers of pirates have decreased, global temperatures have increased.


Based on these two data points shouldn’t we be spending more time fighting global warming by increasing the number of pirates worldwide? On its face, that’s ridiculous because we have prior knowledge telling us this conclusion should be dismissed.  What about if the two data points were number of products on a web page and average sale amount?  Those two sound plausibly linked when shown increasing on a graph together.  In reality it presents no more solid proof than pirates and global temperatures.

What correlation shows is cause for a hypothesis and justification for an experiment. Experimentation is a key tactic of big data strategy.  It allows us to establish a causal relationship between multiple variables.  That’s why we say big data reveals deep insights.  It reveals why something is happening rather than telling us something is happening and leaving the rest to our interpretation.  Again, the business needs to raise their expectations to realize the potential of big data.

The lesson from these stories is that initiatives need to go all in. A small data initiative needs to stay that way even with access to larger datasets and big data analytical tools.  A big data initiative needs to think in terms of large datasets and big data tools.  A mixture leads to failures.  They also show that the business needs to expect more from big data.  Big data tools and datasets should lead to better quality analytics.

Making the Jump from Algorithmic To Heuristic

Algorithms are theories / equations that help us make predictions under certainty. That means we know all the variables, options, probabilities and outcomes.  It’s the low hanging fruit of big data and so it’s what gets done first.

As the business becomes more accustomed to data enabling decisions, the questions being asked of data become more complex. That leads to a greater number of increasingly complex algorithms.  These take significant skill to create and implement as well as greater horsepower to run.  They also make visualization increasingly difficult.

As a result, job descriptions for data scientists become increasingly hard to fill because they require in depth knowledge of complex scientific and statistical principals coupled with high end programming skills. Costs rise as hardware needs increase and the company starts to produce customized solutions to their specific business needs.  This is the big data maturity chasm and it’s a result of the law of diminishing returns.

An algorithmic approach has significant limitations and needs to be replaced early on in the adoption of big data with a heuristic approach. Heuristics, simply put, are what allow us as people to recognize patterns.  Heuristics allow machines to recognize obscure patterns in very large sets of data.  These deeper patterns are the big insights of big data.  Without heuristics businesses tend to abandon big data without really getting what they paid for.

Complexity, Uncertainty & the Irrational

If no one gets it, no one will use it. That’s true of a lot of technology.  With big data, complexity is inherent and that scares people away.  Big data is pigeon holed as a marketing only tool or not ready for prime time because the complexity escapes from the data science group.  As soon as a business user sees a differential equation their perception of the tool changes and that’s a difficult thing to undo.  It slows adoption of big data in a lot of companies.

Uncertainty has much the same effect on business users. Not knowing what big data can do and what the overall strategy for big data is within the company makes it hard to get a handle on how big data will impact them specifically.  It’s hard to ask the right questions and propose initiatives that would benefit the organization.  Goals, a big data strategy and people explaining big data in business terms are all critical pieces to removing uncertainty.

Even groups that don’t benefit from big data need to be included. They don’t need a voice at the table but they do need a clear understanding of what’s happening.  I won’t bore you with the war stories but I’ve seen some very irrational reactions to being left in the dark about the business’s big data strategy and goals.  Those reactions are well worth the few hours of education required to avoid them.

Data Governance

Many big data pitfalls revolve around data governance. Data governance covers a range of topics:

  • Data Collection
  • Data Integrity or Data Quality
  • Privacy
  • Security
  • Ethics and Compliance

Ignoring these issues creates hurdles the business will have to face later. Facebook has recently generated some backlash for their data experiments.  Target and other retailers are dealing with the costs of customer data breaches.  Google frequently deals with concerns stemming from their wide ranging collection and use of personal data.

In the best case scenario, poor data governance still increases the cost of big data. In the case of data quality issues it can cause a business to stop trusting the data and all the reports, insights and analytics generated from that data.  Privacy, security and ethical issues can cause customers to lose faith in the brand and business.

A business needs policies and processes to manage its big data. Collection and usage policies need to be well communicated to customers and consistent with other customer brand experiences.  Just like any other product, data needs quality testing regimes to insure it meets the expectations of those using it.  These aren’t complicated steps in and of themselves but the combination of all the issues surrounding data governance usually lead to something being left out.  An oversight team or program manager can prevent that pitfall.

Awareness Is Most of the Solution

Big data is no longer a wild, wild west type of technology. It’s matured and stabilized quickly.  Trial and error are no longer necessary realities of being an early adopter.  There are great products and a lot of expertise available to help businesses realize the promise of big data in a well-managed way.

However, as with any other technology rollout, it is not problem free. Knowing what the pitfalls are allows for better planning and a smoother implementation.  That’s key for successful initiatives and companywide adoption.