How to Run a Data Science Experiment & Why It’s Critical To Big Data Success

The biggest jump in data science’s ROI comes when a business matures from correlation to causality based initiatives. I worked with a global retailer last year to improve their in store average sale by increasing average number of items.  We started by surveying sales associates who led their stores in these categories.  “What do you do to get the customer to buy more from you?”  As you can imagine, we got a wide variety of responses.

I knew we had a lot of noise and a little signal in the responses. If we had used correlation techniques we would have done something like select the most common responses and present, “86% of high performing sales associates use suggestive selling to increase their average sale.”  Data science is able to do a lot more than state the obvious.  Deep insights come from causal relationships.

So we experimented with the responses. We trained a variety of techniques and measured the results on individuals’ average sale and average number of items.  We found more noise.  Regional differences, differences between sales people, and training techniques all caused variations which blurred experimental results.  Hypothesis became increasingly granular and experiments became more controlled and precise.

That’s when we started discovering gold. Initiatives with names like, “Plan to Increase Lowest Performing 15% of Sales Associates in the US Southern Region’s Average Sale By 45%” came out of our findings.  Just over 90% of these initiatives have achieved or exceeded their goals.  The retailer has the skills in place now to assess what went wrong with the other roughly 10% of initiatives and further refine their understanding through additional experiments.

There’s value in this approach but for most of my clients, it’s the first time they’ve undertaken anything like this. With repetition, I’ve come to learn the patterns that lead to the best practices in data experiments.  It’ll come as no surprise that these patterns are what hard scientists have been preaching to their students for a very long time now.

Every Experiment Needs a Review Process

The experimental process needs oversight. There are too many business, ethical, privacy, bias, and domain concerns to not have multiple eyes on any experiment that a company undertakes.  There are so many ways for personal bias to creep into an experiment or for someone who’s well-meaning to do something unethical.  This has been my biggest takeaway from data science experiments.  Something will go wrong if experimentation is contained in a silo.

Streamline Everything

The faster your business can go from hypothesis generation to proving or refuting it, the faster your business will act on the insights and move on to the next one. The first few experiments will take a long time but don’t feel like that’s the norm.  Speed is key in business and data science experiments should get faster as the business gets more experience running them.  Data science alone is a competitive advantage today because only a few businesses have those capabilities.  As data science becomes more pervasive, the advantage will shift to speed and sophistication.

Use a 3 Phased Discovery Process

The first phase is detection. This is what statistical data scientists are really good at.  They find correlation between multiple elements hidden in massive data lakes.

The second phase is experimentation. Experimental data scientists use the discovery of correlation to generate a hypothesis and design an experiment that will prove or refute that hypothesis.  Then they run the experiment and analyze the results.

The third phase is application. An applied data scientist can take the experimental result and visualize it in a way that’s easily understood, meaningful and actionable.  They’re the connection between experimental results and ROI.

Typically an individual will have the skills to do a single phase with more senior data scientists able to do two phases.

Transparency Is Hard But Necessary

Make sure everyone knows what’s going on. Specifics are proprietary so those shouldn’t be disclosed.  The fact that data is being gathered and experiments run needs to be disclosed to all involved.  If anyone has an issue with that, there needs to be a process in place to omit them from the data gathering and experimental process.

Even Proven Theories Get Overturned About 10% to 20% of the Time

It happens in science and it will happen in business. It should be no higher than 20% of the time or something is wrong with the experimental process.  If no thesis is overturned or subsequently refined, that’s a problem too.

Experimentation – A Sign of Growing Data Science Maturity

Companies start with data science running strictly correlation techniques.  These are the ones best supported by current software offerings and data science skills.  As these capabilities mature the correlations move from obvious to very obscure.  However, the value of correlation is limited and the business needs typically outgrow correlation within a couple of years.  That’s because correlation is descriptive and the business needs prescriptive and predictive.

Experimentation is the next step and the insights follow a similar trajectory; starting out by yielding obvious insights and quickly migrating to obscure insights. The value of these obscure insights isn’t as limited.  It leads to a more granular understanding of customer preferences, competitors’ actions, employee productivity, and investor sentiment among many others.

These types of granular insights lead to models that allow a business to understand the most likely impact of their actions as well as understand the full spectrum of available choices. When a company is able to see beyond the obvious choices their people become more innovative and creative.  When a company is able to see beyond the obvious impacts of their decisions their people become more strategic.  The hypothesis of data science is this shift towards creativity and strategy will yield better business outcomes.  So far, the data looks promising to prove this hypothesis.

The Future of Big Data

The Future of Big Data & Data Strategy

For any strategy to succeed it needs to look towards the future and data strategy is no different.  What’s ahead for big data will result from a few key trends:

  • Increased availability of data
  • The combination of data with other technologies
  • An increasingly data-savvy customer base
  • Data driven competition & business strategy

Here’s what we at V2 see coming in the near, mid and long term.

What We Know Is Coming (2 to 3 years out):

Dataset Sizes & Computing Power Will Continue to Rise

One thing for sure is datasets will continue to get bigger and move towards exascale.  In English that’s incomprehensible data sizes that the computing power and software are now being architected to accommodate.  Preparing for big data now means laying the framework for a smooth transition from big data at the current scale to big data at exascale.

The Internet of Things

The Internet of Things will make a big impact on data gathering and data sizes.  A lot of the push behind exascale is in preparation for the IoT.  Greater depth and breadth of consumer and employee data will be available.  Wearables and home/office automation are the leading edge of the IoT.  In the next 2 to 3 years enterprises will have access to real time data they can only dream of right now.

Personal Use of Big Data

This is the biggest of the big data disruptions.  As the IoT gains traction in the next 5 years, it’ll lead to a rise of personalized use of big data and analytics.  People will be able to engineer their own performance using big data to achieve their goals.  The early adopters are already doing this with:

  • Fitness & Diet
  • Personal Finance
  • Career Path
  • Education

As more personal decisions become data driven, it’ll change a lot about how businesses and customers interact.  The effectiveness of traditional brand and marketing techniques will diminish as customers begin to make more rational, data driven buying decisions.  Businesses without an analytics driven digital strategy will find it difficult to compete.

Augmented Reality

Augmented Reality will be the visual layer that enables individual consumer and employee data driven decision making.  It’s the game changer that brings real time analytics into the mainstream.

Privacy & Data as a Commodity

As the media covers data breaches and discusses the ethics of how businesses use personal information, consumers are getting smarter about who they allow to use their data.  In 2 to 3 years a data breach like Target’s will be looked at by customers as a breach of contract.  As people generate more data and companies demand more it will become a commodity.  People will expect something of value in return for access to their stream and will terminate access if they believe their privacy is in jeopardy.  Data compliance will become even more important than it is now.  As data access becomes business critical, anything that cuts off access will be a significant threat.  Security and transparency are critical success factors of big data program management.

Data Driven Business Models

As companies become more sophisticated about their use of data, business strategy will be increasingly data driven.  That’s already started in the majority of the Fortune 100.  Looking 2 to 3 years down the road that trend will lead to data driven business models.  In Rita Gunther McGrath’s book “The End of Competitive Advantage” she writes about the growing trend of businesses that compete in arenas or collections of markets with a similar high level focus.  Data driven business models are the rise of the arena sized company.  Competition for the highest margin and volume business models will become a focus for these arena sized businesses with data guiding their market entry decisions.

What We Are Fairly Sure Is Coming (3 to 5 years out):

Computer Learning Merges With Big Data

What happens when analytic sets become too complex for people to visualize in any meaningful way?  We turn to software for insights.  Visualization is challenging now.  Just think about it in 5 years.  Systems will increasingly become self-reliant for real time, complex decision making.  Customers will be accustomed to devices informing their everyday decisions.  The result is allowing software to handle some ground level decision making for us and for our enterprises.

A Higher Expectation of Service

81% of the Fortune 100 have adopted a big data and analytics solution.  49% of all big businesses plan on adopting a solution in 2014.  As analytics use matures and becomes widespread, the services and insights they provide will be expected by customers.  Businesses that haven’t been building their analytics capabilities will find themselves unable to compete for customers.

Modular Business Transient Competitive Advantages

As data reveals opportunities for revenue many of them will be short term or outside a business’s capabilities.  At the same time competition for new revenue streams will become more intense.  The result will be businesses looking for ways to capitalize on these transient advantages.  Enter modular business.  The concept revolves around self-assembling business units.  Much of the labor and knowledge will be outsourced with brand, leadership and distribution provided by the business.  Data and analytics will provide insights on what to produce and for how long.  As the opportunity runs its course, so does the business unit.  Internal resources are allocated to the next transient advantage and the outsourced modules disband with minimal draw down costs.

The Full Potential (5 to 10 years out):

The Mind Unleashed – Hyper-Productivity

Many sci-fi writers pen books about a time where computers surpass people in intelligence and decision making.  In about 5 to 10 years that concept will be firmly put to rest.  Most cognitive researchers understand the pure decision making engine that is our mind.  What limits us is the number of trivial decisions we have to make each day.  Put a mind to work on the truly complex problems we could spend our days computing and enable it with real time predictive modeling; the potential for productivity is great.

Real Time, Predictive Everything

Most people think, at most, a few moves ahead.  The true strategic visionaries look only a bit farther down field.  What decisions would you advise your younger self to make?  With Real Time Predictive Modeling that power will be available when making all our decisions.  As predictive modeling becomes mainstream, our ability to see all potential options and understand the long term results of each option will enable a high level of strategic decision making.  Imagine the potential of every employee being able to think like a CEO.

Real Time, Personalized Everything

A common theme of the advancement of big data is its combination with other technologies to enhance its impact.  As 3D printing, exascale and the IoT hit their stride in 5 years or so the ability to personalize in real time becomes reality.  Branding and marketing move away from the generic, blast communications and to personalized messages delivered at the optimal moment to influence buying behaviors.  Products will give customers the ability to personalize what they buy without waiting.  As customers become increasingly self-expressive they will want to differentiate themselves by personalizing what and how they buy.  They will expect their brand interactions to be personalized.  Companies that fail to deliver personalization to their customers will find it difficult to compete.

Key Take Aways

Big data, analytics, software tools and strategy have reached a high level of maturity.  They’re ready for businesses of all sizes.  Their insights are bringing significant competitive advantages to early adopters.

The infrastructure and framework being put in place today must support the needs of tomorrow.  A unified data strategy helps keep business goals in focus while planning for the future.

Businesses falling behind the curve will find it difficult to compete in the near term and be unable to compete in the next 3 to 5 years.  Access to data and the skills to transform unstructured data into actionable insights are both business critical.

Customers are becoming more data savvy.  Data security and transparency in data collection policies are important parts of a business’s data strategy.  It will be imperative for businesses to keep up with customer expectations of service based on real time analytics.  As customers incorporate data and analytics into their daily lives, they will expect more personalized brand and product experiences.

Arena sized businesses will increase the competitive pressure for new revenue streams using advanced analytics to guide their entry into new markets.  Data driven, modular business models will allow companies to quickly capitalize on transient competitive advantages.  It will drive revenue growth for those businesses with the capability set to turn unstructured data into predictive models of the market.