My A.I. Quoted Socrates: Data Science’s Make or Break Moment

We’re moving very quickly into data sizes and result set complexities that exceed our ability as people to evaluate. That’s led to the rise of “big data” and data science.  Our solution is an increasing reliance on machine learning methodologies to parse through what we can’t and reduce the complexity to something manageable.  How we program that parsing is leading to a massive problem for data science.  How we handle this and other challenges will determine the future of the profession.

The Problem of Bias

What’s signal and what’s noise or, asked another way, what’s important and what’s not? People develop heuristics to make that determination.  Study decision making and it’s obvious that those heuristics are deeply biased.  As a result we can make leaps of intuition and we can also fall for the simple tricks casinos play on us.  Our biases blind us to certain information while causing us to rely more heavily on other types.  The bottom line is that what people think is important is heavily influenced by our biases.

Machine learning in data science is represented as exactly the opposite. That’s the cornerstone claim, right?  If machine learning’s capabilities are no different than our own heuristics then what’s the point?  “Big Data” is supposed to be providing a different perspective; one rooted in data, able to see beyond our own biases and limitations.

Please don’t think I’m entering into the machine vs. human debate. I’m making the point that “Big Data” is supposed to be different than what we do on our own.  Without that difference, it falls short of its potential.  Is it still useful for simplifying large datasets? Yes but isn’t that just efficiency allowing us to speed up the process?  That’s great but no major advance.  We’ve been using technology to speed up processes for a while now.  Will data science achieve its promise if it relies on the same heuristics we do? No more often than we would on our own which means, no.

But Data Scientists Are Trained To Detect Bias, Right?

We are. We look for bias in data sources and results.  Any data source which excludes a portion of our target population is biased.  When results mirror our assumptions too closely (100% of emails which contain the word ‘Viagra’ we reported as spam) they are looked at for bias.  When correlations lead to a flawed hypothesis (As the numbers of pirates have declined average global temperatures have risen) they are tossed out as irrelevant using further experimentation to refute the hypothesis.

The experiment is really our last line of defense against bias. Experiments have moved us beyond faulty assumptions about a flat earth and taken black holes out of Sci-Fi.  In data science, a disciplined scientific method is moving businesses past bad assumptions and proving out new business models.

Anyone who’s done a data experiment knows most are time consuming, labor intensive efforts. So to achieve the promise of “Big Data” which is the removal of our own biases to reveal genuine insights, we sacrifice the speed business craves.

Here’s the Problem

We are left with two options. Introduce our own biases and realize the same results we get on our own but faster.  Remove our biases and make new discoveries slowly.  Neither scenario is ideal.  Again, I’m not disputing that progress has been made but what I’m saying is we’ve only achieved incremental progress while we’re promising a revolution.

Automation is the solution to this that I hear most often. We’re already automating the heuristic approach which dramatically illustrates the problems bias presents.  Automating the experimental approach leads to a completely different issue.  If you automate the experimental approach then the question becomes which hypothesis do we test?  Again we need to remove our own biases so let’s automate the hypothesis discovery process.  That leads to a lot more hypothesis discovered which leads to a lot more experiments and slows things down even more.  Let’s automate a process to prioritize the most important hypothesis first.

Here’s where our AI starts quoting Socrates. Is the pattern important because the programmer thinks it is or does the programmer think a pattern is important because it is important?  The first solution, the pattern is important because the programmer thinks it is, is obviously biased which we’re trying to avoid.  The second solution means the programmer cannot be trusted as the source for the heuristic to determine what patterns are important.  The machine must therefore create its own by experimenting with every pattern it finds to determine an unbiased heuristic.

What defines experimental success? Is a business model successful if it leads to short term profits at the expense of longer term success?  Is a business model that pays tomorrow at the cost of today’s success better?  Is a business model only a success if it works both in the short and the long term?  Is success defined by revenue, margin, business value or some combination?

That’s the rabbit hole. Automation and every solution I’ve heard presented, breaks down to some level of bias which skews the results towards an unacceptably high level of failure.

Why Am I Tilting At Windmills?

If we don’t make some progress towards answering the big questions, this becomes just another IT fad. As data scientists, we have an opportunity to take what we’ve started and build a discipline with legs.  We’re linked through our education and approach to academia and our value links us to business.  That’s a rare pipeline.  Showing the business value in decreasing the bias in reinforcement learning and unsupervised learning to improve the accuracy of prescriptive and predictive analytics is a big part of that.  I think it’s the first big question we face and a make or break moment for our profession.  We can take the hard road and work the solution or we can lower expectations.

I’m advocating for the hard road while I’m seeing a lot of colleagues working to lower expectations. I’m all for being realistic but a lot the initial projections for data science are realistic.  Data driven business model generation, real time marketing personalization, real time pricing, demand forecasting, decision modeling, etc. are all attainable goals.  I don’t see how backing away from what’s possible because we’ve encountered problems is part of the scientific or engineering approach.  We run towards problems not away from them, right?

If you look at what Google’s done with data, their approach and success drive my sentiments. They’ve been faced with the choice of lowering expectations or working on complex problems throughout their time in business.  They choose to work the hard problems around data collection, analysis and presentation.  Typically they’ve flown against those who don’t understand why they’re tilting at windmills like self-driving cars, drones, augmented reality and many others.  The results have built one of the most successful companies of our generation.  If you look at their competitors who have taken to lowering expectations like Bing or Yahoo, the results have been significantly less successful.

In the current business climate, the problems we walk away from are the opportunities others seize. Choosing to work the problem is deciding to take our opportunity.  So here’s data science’s moment; rise to the challenges or leave them for someone else.

But that’s just my bias. Yours is the one that counts.

How to Run a Data Science Experiment & Why It’s Critical To Big Data Success

The biggest jump in data science’s ROI comes when a business matures from correlation to causality based initiatives. I worked with a global retailer last year to improve their in store average sale by increasing average number of items.  We started by surveying sales associates who led their stores in these categories.  “What do you do to get the customer to buy more from you?”  As you can imagine, we got a wide variety of responses.

I knew we had a lot of noise and a little signal in the responses. If we had used correlation techniques we would have done something like select the most common responses and present, “86% of high performing sales associates use suggestive selling to increase their average sale.”  Data science is able to do a lot more than state the obvious.  Deep insights come from causal relationships.

So we experimented with the responses. We trained a variety of techniques and measured the results on individuals’ average sale and average number of items.  We found more noise.  Regional differences, differences between sales people, and training techniques all caused variations which blurred experimental results.  Hypothesis became increasingly granular and experiments became more controlled and precise.

That’s when we started discovering gold. Initiatives with names like, “Plan to Increase Lowest Performing 15% of Sales Associates in the US Southern Region’s Average Sale By 45%” came out of our findings.  Just over 90% of these initiatives have achieved or exceeded their goals.  The retailer has the skills in place now to assess what went wrong with the other roughly 10% of initiatives and further refine their understanding through additional experiments.

There’s value in this approach but for most of my clients, it’s the first time they’ve undertaken anything like this. With repetition, I’ve come to learn the patterns that lead to the best practices in data experiments.  It’ll come as no surprise that these patterns are what hard scientists have been preaching to their students for a very long time now.

Every Experiment Needs a Review Process

The experimental process needs oversight. There are too many business, ethical, privacy, bias, and domain concerns to not have multiple eyes on any experiment that a company undertakes.  There are so many ways for personal bias to creep into an experiment or for someone who’s well-meaning to do something unethical.  This has been my biggest takeaway from data science experiments.  Something will go wrong if experimentation is contained in a silo.

Streamline Everything

The faster your business can go from hypothesis generation to proving or refuting it, the faster your business will act on the insights and move on to the next one. The first few experiments will take a long time but don’t feel like that’s the norm.  Speed is key in business and data science experiments should get faster as the business gets more experience running them.  Data science alone is a competitive advantage today because only a few businesses have those capabilities.  As data science becomes more pervasive, the advantage will shift to speed and sophistication.

Use a 3 Phased Discovery Process

The first phase is detection. This is what statistical data scientists are really good at.  They find correlation between multiple elements hidden in massive data lakes.

The second phase is experimentation. Experimental data scientists use the discovery of correlation to generate a hypothesis and design an experiment that will prove or refute that hypothesis.  Then they run the experiment and analyze the results.

The third phase is application. An applied data scientist can take the experimental result and visualize it in a way that’s easily understood, meaningful and actionable.  They’re the connection between experimental results and ROI.

Typically an individual will have the skills to do a single phase with more senior data scientists able to do two phases.

Transparency Is Hard But Necessary

Make sure everyone knows what’s going on. Specifics are proprietary so those shouldn’t be disclosed.  The fact that data is being gathered and experiments run needs to be disclosed to all involved.  If anyone has an issue with that, there needs to be a process in place to omit them from the data gathering and experimental process.

Even Proven Theories Get Overturned About 10% to 20% of the Time

It happens in science and it will happen in business. It should be no higher than 20% of the time or something is wrong with the experimental process.  If no thesis is overturned or subsequently refined, that’s a problem too.

Experimentation – A Sign of Growing Data Science Maturity

Companies start with data science running strictly correlation techniques.  These are the ones best supported by current software offerings and data science skills.  As these capabilities mature the correlations move from obvious to very obscure.  However, the value of correlation is limited and the business needs typically outgrow correlation within a couple of years.  That’s because correlation is descriptive and the business needs prescriptive and predictive.

Experimentation is the next step and the insights follow a similar trajectory; starting out by yielding obvious insights and quickly migrating to obscure insights. The value of these obscure insights isn’t as limited.  It leads to a more granular understanding of customer preferences, competitors’ actions, employee productivity, and investor sentiment among many others.

These types of granular insights lead to models that allow a business to understand the most likely impact of their actions as well as understand the full spectrum of available choices. When a company is able to see beyond the obvious choices their people become more innovative and creative.  When a company is able to see beyond the obvious impacts of their decisions their people become more strategic.  The hypothesis of data science is this shift towards creativity and strategy will yield better business outcomes.  So far, the data looks promising to prove this hypothesis.

Big Data’s Big Pitfalls

There are a lot businesses getting into big data this year and even more in the planning stages for next year. Yours may be one of them.  I’ve seen firsthand the positive impact big data initiatives make on businesses.  The cost savings, revenue streams and competitive advantages are well evangelized.  The pitfalls are not.  Here are a few ways I’ve seen data initiatives go wrong.  Please add to this post by sharing your stories in the comments.

Small Data Packaged As Big Data

This is a common pitfall of any change; taking the old way of thinking and applying new tools. Big data is a new product rather than an incremental improvement on small data.  Where small datasets were able to tell a business that 80% of all customers… or 40% of all employees…, big data has the ability to be much more specific and granular.  It reveals insights like customer Ryan A. has a 52% likelihood of making a second purchase in the next 3 months and a 91% likelihood if we send him a special offer of free shipping.  To realize the potential of big data, the business needs to raise its expectations.

As the last example reveals, big data is also prescriptive. It shows a clear course of action in many cases where as small data typically requires significant interpretation to determine a plan of action.  Small data packaged as big data often leads to paralysis by analysis and conflicting conclusions.  Incomplete analysis shouldn’t be tolerated.

Big data insights reach conclusions about causality while small data focuses on correlation. When the two get confused in a presentation it leads to poor decision making.  Google used a corollary model for flu predictions.  It worked in the short term but failed publicly and catastrophically in the long term.  Fortunately no one was taking any actions based on the model but businesses often use corollary models to inform business strategy decisions with erratic results.

When I see data point correlation I use this example to show why they are logic traps. Over the last 200 years as the numbers of pirates have decreased, global temperatures have increased.

PiratesVsTemp

Based on these two data points shouldn’t we be spending more time fighting global warming by increasing the number of pirates worldwide? On its face, that’s ridiculous because we have prior knowledge telling us this conclusion should be dismissed.  What about if the two data points were number of products on a web page and average sale amount?  Those two sound plausibly linked when shown increasing on a graph together.  In reality it presents no more solid proof than pirates and global temperatures.

What correlation shows is cause for a hypothesis and justification for an experiment. Experimentation is a key tactic of big data strategy.  It allows us to establish a causal relationship between multiple variables.  That’s why we say big data reveals deep insights.  It reveals why something is happening rather than telling us something is happening and leaving the rest to our interpretation.  Again, the business needs to raise their expectations to realize the potential of big data.

The lesson from these stories is that initiatives need to go all in. A small data initiative needs to stay that way even with access to larger datasets and big data analytical tools.  A big data initiative needs to think in terms of large datasets and big data tools.  A mixture leads to failures.  They also show that the business needs to expect more from big data.  Big data tools and datasets should lead to better quality analytics.

Making the Jump from Algorithmic To Heuristic

Algorithms are theories / equations that help us make predictions under certainty. That means we know all the variables, options, probabilities and outcomes.  It’s the low hanging fruit of big data and so it’s what gets done first.

As the business becomes more accustomed to data enabling decisions, the questions being asked of data become more complex. That leads to a greater number of increasingly complex algorithms.  These take significant skill to create and implement as well as greater horsepower to run.  They also make visualization increasingly difficult.

As a result, job descriptions for data scientists become increasingly hard to fill because they require in depth knowledge of complex scientific and statistical principals coupled with high end programming skills. Costs rise as hardware needs increase and the company starts to produce customized solutions to their specific business needs.  This is the big data maturity chasm and it’s a result of the law of diminishing returns.

An algorithmic approach has significant limitations and needs to be replaced early on in the adoption of big data with a heuristic approach. Heuristics, simply put, are what allow us as people to recognize patterns.  Heuristics allow machines to recognize obscure patterns in very large sets of data.  These deeper patterns are the big insights of big data.  Without heuristics businesses tend to abandon big data without really getting what they paid for.

Complexity, Uncertainty & the Irrational

If no one gets it, no one will use it. That’s true of a lot of technology.  With big data, complexity is inherent and that scares people away.  Big data is pigeon holed as a marketing only tool or not ready for prime time because the complexity escapes from the data science group.  As soon as a business user sees a differential equation their perception of the tool changes and that’s a difficult thing to undo.  It slows adoption of big data in a lot of companies.

Uncertainty has much the same effect on business users. Not knowing what big data can do and what the overall strategy for big data is within the company makes it hard to get a handle on how big data will impact them specifically.  It’s hard to ask the right questions and propose initiatives that would benefit the organization.  Goals, a big data strategy and people explaining big data in business terms are all critical pieces to removing uncertainty.

Even groups that don’t benefit from big data need to be included. They don’t need a voice at the table but they do need a clear understanding of what’s happening.  I won’t bore you with the war stories but I’ve seen some very irrational reactions to being left in the dark about the business’s big data strategy and goals.  Those reactions are well worth the few hours of education required to avoid them.

Data Governance

Many big data pitfalls revolve around data governance. Data governance covers a range of topics:

  • Data Collection
  • Data Integrity or Data Quality
  • Privacy
  • Security
  • Ethics and Compliance

Ignoring these issues creates hurdles the business will have to face later. Facebook has recently generated some backlash for their data experiments.  Target and other retailers are dealing with the costs of customer data breaches.  Google frequently deals with concerns stemming from their wide ranging collection and use of personal data.

In the best case scenario, poor data governance still increases the cost of big data. In the case of data quality issues it can cause a business to stop trusting the data and all the reports, insights and analytics generated from that data.  Privacy, security and ethical issues can cause customers to lose faith in the brand and business.

A business needs policies and processes to manage its big data. Collection and usage policies need to be well communicated to customers and consistent with other customer brand experiences.  Just like any other product, data needs quality testing regimes to insure it meets the expectations of those using it.  These aren’t complicated steps in and of themselves but the combination of all the issues surrounding data governance usually lead to something being left out.  An oversight team or program manager can prevent that pitfall.

Awareness Is Most of the Solution

Big data is no longer a wild, wild west type of technology. It’s matured and stabilized quickly.  Trial and error are no longer necessary realities of being an early adopter.  There are great products and a lot of expertise available to help businesses realize the promise of big data in a well-managed way.

However, as with any other technology rollout, it is not problem free. Knowing what the pitfalls are allows for better planning and a smoother implementation.  That’s key for successful initiatives and companywide adoption.

Big Data & Talent Management, A Match That Drives Margins

It’s well know that big data can provide strategic insights about customers but what if your customers are the business’s talent? What kind of insights can HR, a group that already benefits from access to high quality data, derive from big data?  For big data to add value it needs to be focused on new, forward looking and increasingly granular insights.  Just as marketing is using big data for real time, predictive and personalized, HR can too.

I like to start data driven journeys with questions. When a business looks at its people, its human talent, the obvious questions come to mind first:

  • What engages them and what disengages them?
  • How does the business attract and retain the talent it needs?
  • What’s the right compensation?
  • Where do we train and what do we train on to move the organization forward?

What’s the business interest in engagement, training, hiring, retention and compensation? The interest revolves around value creation.  HR has studies that show engaged, well trained employees produce more value than the opposite.  Looking at the studies on employee value creation, that interest in value is a two way street.  Employees are more satisfied where they feel like they’re maximizing their contribution to the business and receiving a fair compensation in return.  It turns out that HR is in the business of understanding value; how employees create it and how they consume it.  The goal is to maximize the value created while keeping the value consumed in line with the profit margins the business expects.

Now we have an algorithm and as data scientists, we love those. We can run algorithms against datasets and they return insights.  To get the dataset we need, we have to ask regression questions: how do employees create value and how does the business return value to employees?  Value creation data coupled with talent management data yields the kinds of insights that drive margins higher.

Productivity – Employee Value Creation

How much value does any given department produce let alone an individual employee? Answering this question with increasing levels of granularity is possible with big data.  The process of answering this question puts HR at the center of gathering some of the most strategically valuable insights the business will get from big data.

It starts by looking at the business model and understanding what value the business creates. For many retailers, having this information five to ten years ago would have helped them avoid the downturn they now find themselves in.  Looking at Best Buy as an example, they provide customers with a physical location to see and try out electronics.  While they generate their revenues from selling electronics, their value to customers is as a showroom.  Employees are trained to sell to a customer, which adds value to the business but doesn’t add value to the customer.  From an HR perspective, the hiring and training strategies are emphasizing the wrong skills and setting the business up for failure.

Businesses have answered the value creation question when the customers’ answers line up with the company’s answers. For HR and many other groups in the business, understanding how the customer perceives the value created by the business is an essential piece of information.  It’s a dataset that marketing already has access to in many businesses and it becomes even more useful when it’s linked to talent management data.  It helps HR answer a very important question; do we have the right talent to support the business model?  That analysis also needs to be forward looking.  The business has a three to five year strategy and HR needs to know how value creation will change over that time to keep their strategies in lock step.

Even in the most basic question about productivity, “what do we produce?” there’s significant value for HR. Big data provides valuable insights by predicting the cost of hiring necessary skills over the next five years and modeling training initiatives to help current employees maximize their value in a changing business.  As the questions get more granular about value creation HR has an opportunity to provide insights that improve margins.

The examination of value creation next looks at how that value is built by the business’s talent. Value stream mapping and several other tools have been used to get a high level understanding of this process in groups with direct contact with the value creation process.  What about groups that don’t build products for sale; how do they create value?  In many departments that don’t touch the product, measuring value creation has been more art than science.  From an HR perspective, advising departments on head count, skill sets, training and organizational structure only works with a solid understanding of how they create value.

That analysis starts with a familiar process, connecting customer data with talent management data. The customers aren’t always obvious.  I worked with a global hospitality company looking to better understand internal value creation to help them increase margins.  They were struggling to create a picture of how their finance group was creating value for the business.  Long data presentation short…their customers were investors, other internal teams, and government agencies.  We were able to show the finance organization was running at a 20% margin in a supportable way.  With the concrete understanding of how they created value they were able to restructure their activities to grow that to a 27% margin.

HR contributed to that effort in a big way. They provided data on existing skillsets, training options versus hiring costs and helped with the restructuring.  Other organizations quickly realized the value of this model.  HR became a strategic partner in focusing the business on hiring, training and restructuring strategies that drove significant improvements in operating efficiency.  It was all enabled by connecting a better understanding of how the business creates value with talent management data.

These small data wins are drivers for HR to be involved with big data initiatives. Large datasets and the resulting insights allow HR to build increasingly granular pictures of how training, hiring and retention can contribute to higher margins and prepare the business to execute on the longer term strategy initiatives.  Tightly connecting talent management data with value creation data is the key.

Compensation – Employee Value Consumption

Most businesses have adopted some form of market based compensation. How does that correlate to employee value creation?  Short answer, based on a lot of employee data, is it doesn’t.  The variance in value creation between two nearly identical (skills, education and experience) employees can be massive.  Instinctively we know that.  However, without big data, we can only talk about the variance in general, unsupportable terms.

Why do employees with similar skills create different levels of value for the business? Long data presentation made short (if you want the long version of any of these, email me)…at the highest level it comes down to compensation and employee satisfaction.  As studies show, higher levels of employee satisfaction are closely tied to higher levels of productivity.  What we’re beginning to learn is how closely tied compensation and employee satisfaction are when the definition of compensation is expanded to go beyond salary, health care and retirement.

With big data, compensation can be viewed in a new way. Items that used to be intangibles can now be adequately quantified and related to individual employee compensation.  A lot of these intangibles are well known to businesses: healthy food options, a gym, a game room, outdoor break spaces, and flexible work hours.  Some of them are less well understood and I’ll get to that in a minute.  The important connection big data reveals goes from productivity, across satisfaction, to compensation.

Why is this important? It allows a business to create compensation strategies with measurable, supportable ROI.  I worked with a mid-sized software development company looking for ways to improve productivity without increasing headcount.  After running a new type of employee survey we discovered that many of the developers were interested in a healthy lifestyle focused on diet and exercise while also believing that their work interfered with those pursuits.  The business spent $120 thousand that year on a gym and healthier food choices in vending machines and the cafeteria.  We tracked usage statistics for the cafeteria and gym compared with productivity levels over the next six months.  Long presentation short again…productivity gains saved the company fifteen full time equivalents or about $2 million over the observation period.

Let’s talk about the areas of employee compensation that contribute to satisfaction that aren’t well understood without big data.  Leadership, performance feedback and advancement are all areas known to contribute to productivity and employee satisfaction.  Getting any more specific than that becomes difficult because it requires a high level of granularity.  People are different, as HR well knows, and those differences lead to a variety of preferences when it comes to leadership style, getting and giving feedback as well as career path.  Without data enabling that level of personalization, maximizing employee productivity is a difficult goal.

This puts HR into a familiar role advising managers on how to get the most out of the people they lead. However, the conversation is a lot more useful when it uses individual specifics.  Leadership strategies can be personalized to the individual.  Teams can be built where the strengths of individual leaders are matched with the preferences of team members.  This methodology also extends to the hiring process.  Fitting candidates by these preferences to teams with similar preferences and leadership strengths simplifies the selection process while improving the candidate pool.  Data shows teams aligned this way are as much as ten times more productive than teams formed with current best practices.

When it comes to compensation, the bottom line is big data enables HR to create a value driven compensation strategy with measurable, supportable ROI. It allows HR to see past dollars as the metric of compensation and start using satisfaction as a measure of compensation.  It also allows compensation to be viewed holistically including elements that without big data become difficult to quantify.  Once HR has this information specific talent management strategies can be built to improve productivity.  That’s a big strategic win for HR and for the business as a whole.

Looking At People as Individuals

The future of engagement is personalization. Marketing understands that customers are expecting increasing levels of personalization.  Employees, especially top talent, are also coming to expect employers to be able to interact with them on a personal level.  They are looking at compensation packages as more than money.  Over half of all men and over 70% of women would turn down a higher salaried new position if it meant they believed they wouldn’t get along with their new co-workers.  They expect companies to put them to work where their contributions are most valuable to the business.

For all of these reasons, HR needs access to the insights derived from advanced analytics and large datasets. HR also has a high quality dataset that can create significant value when combined with data from customers and other departments.  That combination drives margins and productivity.  Companies that don’t develop a data enabled HR team will find it difficult to compete in the next three to five years.

The Future of Big Data

The Future of Big Data & Data Strategy

For any strategy to succeed it needs to look towards the future and data strategy is no different.  What’s ahead for big data will result from a few key trends:

  • Increased availability of data
  • The combination of data with other technologies
  • An increasingly data-savvy customer base
  • Data driven competition & business strategy

Here’s what we at V2 see coming in the near, mid and long term.

What We Know Is Coming (2 to 3 years out):

Dataset Sizes & Computing Power Will Continue to Rise

One thing for sure is datasets will continue to get bigger and move towards exascale.  In English that’s incomprehensible data sizes that the computing power and software are now being architected to accommodate.  Preparing for big data now means laying the framework for a smooth transition from big data at the current scale to big data at exascale.

The Internet of Things

The Internet of Things will make a big impact on data gathering and data sizes.  A lot of the push behind exascale is in preparation for the IoT.  Greater depth and breadth of consumer and employee data will be available.  Wearables and home/office automation are the leading edge of the IoT.  In the next 2 to 3 years enterprises will have access to real time data they can only dream of right now.

Personal Use of Big Data

This is the biggest of the big data disruptions.  As the IoT gains traction in the next 5 years, it’ll lead to a rise of personalized use of big data and analytics.  People will be able to engineer their own performance using big data to achieve their goals.  The early adopters are already doing this with:

  • Fitness & Diet
  • Personal Finance
  • Career Path
  • Education

As more personal decisions become data driven, it’ll change a lot about how businesses and customers interact.  The effectiveness of traditional brand and marketing techniques will diminish as customers begin to make more rational, data driven buying decisions.  Businesses without an analytics driven digital strategy will find it difficult to compete.

Augmented Reality

Augmented Reality will be the visual layer that enables individual consumer and employee data driven decision making.  It’s the game changer that brings real time analytics into the mainstream.

Privacy & Data as a Commodity

As the media covers data breaches and discusses the ethics of how businesses use personal information, consumers are getting smarter about who they allow to use their data.  In 2 to 3 years a data breach like Target’s will be looked at by customers as a breach of contract.  As people generate more data and companies demand more it will become a commodity.  People will expect something of value in return for access to their stream and will terminate access if they believe their privacy is in jeopardy.  Data compliance will become even more important than it is now.  As data access becomes business critical, anything that cuts off access will be a significant threat.  Security and transparency are critical success factors of big data program management.

Data Driven Business Models

As companies become more sophisticated about their use of data, business strategy will be increasingly data driven.  That’s already started in the majority of the Fortune 100.  Looking 2 to 3 years down the road that trend will lead to data driven business models.  In Rita Gunther McGrath’s book “The End of Competitive Advantage” she writes about the growing trend of businesses that compete in arenas or collections of markets with a similar high level focus.  Data driven business models are the rise of the arena sized company.  Competition for the highest margin and volume business models will become a focus for these arena sized businesses with data guiding their market entry decisions.

What We Are Fairly Sure Is Coming (3 to 5 years out):

Computer Learning Merges With Big Data

What happens when analytic sets become too complex for people to visualize in any meaningful way?  We turn to software for insights.  Visualization is challenging now.  Just think about it in 5 years.  Systems will increasingly become self-reliant for real time, complex decision making.  Customers will be accustomed to devices informing their everyday decisions.  The result is allowing software to handle some ground level decision making for us and for our enterprises.

A Higher Expectation of Service

81% of the Fortune 100 have adopted a big data and analytics solution.  49% of all big businesses plan on adopting a solution in 2014.  As analytics use matures and becomes widespread, the services and insights they provide will be expected by customers.  Businesses that haven’t been building their analytics capabilities will find themselves unable to compete for customers.

Modular Business Transient Competitive Advantages

As data reveals opportunities for revenue many of them will be short term or outside a business’s capabilities.  At the same time competition for new revenue streams will become more intense.  The result will be businesses looking for ways to capitalize on these transient advantages.  Enter modular business.  The concept revolves around self-assembling business units.  Much of the labor and knowledge will be outsourced with brand, leadership and distribution provided by the business.  Data and analytics will provide insights on what to produce and for how long.  As the opportunity runs its course, so does the business unit.  Internal resources are allocated to the next transient advantage and the outsourced modules disband with minimal draw down costs.

The Full Potential (5 to 10 years out):

The Mind Unleashed – Hyper-Productivity

Many sci-fi writers pen books about a time where computers surpass people in intelligence and decision making.  In about 5 to 10 years that concept will be firmly put to rest.  Most cognitive researchers understand the pure decision making engine that is our mind.  What limits us is the number of trivial decisions we have to make each day.  Put a mind to work on the truly complex problems we could spend our days computing and enable it with real time predictive modeling; the potential for productivity is great.

Real Time, Predictive Everything

Most people think, at most, a few moves ahead.  The true strategic visionaries look only a bit farther down field.  What decisions would you advise your younger self to make?  With Real Time Predictive Modeling that power will be available when making all our decisions.  As predictive modeling becomes mainstream, our ability to see all potential options and understand the long term results of each option will enable a high level of strategic decision making.  Imagine the potential of every employee being able to think like a CEO.

Real Time, Personalized Everything

A common theme of the advancement of big data is its combination with other technologies to enhance its impact.  As 3D printing, exascale and the IoT hit their stride in 5 years or so the ability to personalize in real time becomes reality.  Branding and marketing move away from the generic, blast communications and to personalized messages delivered at the optimal moment to influence buying behaviors.  Products will give customers the ability to personalize what they buy without waiting.  As customers become increasingly self-expressive they will want to differentiate themselves by personalizing what and how they buy.  They will expect their brand interactions to be personalized.  Companies that fail to deliver personalization to their customers will find it difficult to compete.

Key Take Aways

Big data, analytics, software tools and strategy have reached a high level of maturity.  They’re ready for businesses of all sizes.  Their insights are bringing significant competitive advantages to early adopters.

The infrastructure and framework being put in place today must support the needs of tomorrow.  A unified data strategy helps keep business goals in focus while planning for the future.

Businesses falling behind the curve will find it difficult to compete in the near term and be unable to compete in the next 3 to 5 years.  Access to data and the skills to transform unstructured data into actionable insights are both business critical.

Customers are becoming more data savvy.  Data security and transparency in data collection policies are important parts of a business’s data strategy.  It will be imperative for businesses to keep up with customer expectations of service based on real time analytics.  As customers incorporate data and analytics into their daily lives, they will expect more personalized brand and product experiences.

Arena sized businesses will increase the competitive pressure for new revenue streams using advanced analytics to guide their entry into new markets.  Data driven, modular business models will allow companies to quickly capitalize on transient competitive advantages.  It will drive revenue growth for those businesses with the capability set to turn unstructured data into predictive models of the market.

What Can Big Data Do For Your Pricing Strategy?

I call people involved with creating pricing strategies margin magicians. It sounds so much better than pricing strategist and it’s a lot closer to the truth.  The magic is a balancing act between margins and volume, supply and demand, competition and competitive advantage.  Data already plays a big role in determining price and has for a very long time.  When I talk to teams about data enabled pricing they come to the conversation saying, “We already do that.”

Where Small Data Can Take Pricing Strategy

Your business probably does too. The business has talked to customers and found that there is a spread of prices they’re willing to pay as well as a spread of prices being charged.  Senior leaders have asked the question, why is that?  Why are some customers willing to spend more and some willing to spend less?  You probably have data on that too.  Brand, quality, features and other common themes rise to the top.  The experiments you’ve run have revealed behavioral trends too.  Things like categorical thinking come up and influence how people perceive price.

Let’s ask the deeper question about why customers pay different prices. It’s a question about how we make buying decisions.  Let’s take a customer at random.  They’re looking to buy a product in a competitive market so they have options.  Many pricing strategies hold that, all things being equal about the products, the lowest price wins.  If products are differentiated from each other, then the one which is the best fit for the customer’s need at the lowest price wins.

That’s because most pricing strategies assume customers to be rational decision makers which could not be farther from the truth. Rational decision making only happens when the customer knows all possible options (decision outcome pairs in the decision space) and has enough information to be certain about how much value (probability and loss or utility) they’ll get from each option.  Does that describe many/any of your customers?

Customers make decisions under uncertainty. As a result two customers with identical product needs can have two completely different prices that they’re willing to pay.  There’s a solution for that called price discrimination.  We offer the same good at different prices to different customer groups.  Since customers talk to each other we’ve also had to come up with clever ways to justify price discrimination.  A plane ticket usually costs more closer to the day of the flight than it does two weeks in advance.  Clothes go on sale at the end of the season or during holidays.  Buy smaller quantities and you’ll pay more than someone buying in bulk.  Better negotiators get a better price.

Big Data Starts To Add Value with 4 Basic Insights

If you look at enough datasets and experiments about customer buying behaviors in relationship to price you’ll discover just how deep the irrational decision making runs. To make a long presentation short, customers pay whatever a company can convince them to up to a budgetary maximum.  That’s a big data insight about pricing and intuitively you’ve always known that.  The data demonstrates that pricing doesn’t operate alone in customer decision making.

To get the most out of your data you have to ask the right questions. With respect to pricing strategy the right question is how do I use price to maximize the value I get from each customer?  Without big data many pricing strategies look at this question from the perspective of a single sale.

The better metric is Customer Lifetime Value (CLV) or the total value of each customer over their entire relationship with the business. Before you think I’ve brought you a ridiculously difficult problem to solve check out this free CLV calculator from the folks at Harvard.  All you need is some basic info about your customer buying habits and retention rates.

Thinking in terms of CLV is leading to some very innovative and lucrative pricing strategies. If you look at how Google and Amazon price, you’re looking at some of the most sophisticated pricing strategies out there.  They are driven by large datasets and are aimed at increasing CLV.  That leads to the second big data insight about pricing.  Companies can use pricing in tandem with product, brand and marketing strategies to increase CLV.  Again, intuitively you probably already knew that.

The key next step is gaining a deeper understanding of the customer. Using analytics to learn then predict how likely the customer is to be loyal, how many products they’re likely to buy and how much a business can do to drive both behaviors are all critical parts of a data driven pricing strategy.  Companies like Sephora drive 80% of all sales through their customer loyalty system and have amassed a significant dataset on customer buying behaviors.  Casinos do much the same thing with some casinos logging 90% of all play through their loyalty system.  Grocery stores have a high percentage of spend through their rewards cards.  CRM is also big in the B2B space, providing the same data for analysis.  The result is a picture of CLV that allows businesses to tailor pricing to drive loyalty/repeat spending and maximize margins on infrequent or one time customers.

The third big data insight comes from a fairly sophisticated layering of loyalty, pricing and marketing data from these datasets. Brand engagement is a significantly higher driver of customer loyalty than price.  Loyalty systems that build a connection to customers through personalized engagement and experiences, like Sephora’s, have much higher CLVs and retention rates than those using discounts.  To make a long data presentation short (if you want the long version email me), discounting frequently leads to the opposite of the desired behavior.

The only thing companies do by indiscriminately lowering prices is train customers to game the system for lower prices. All a competitor has to do to lure those customers away is offer a lower price and the lost margins meant to drive loyalty have resulting in exactly the opposite behavior.  Retail has learned this lesson the hard way and is now working its way back to a more profitable business model.

I worked with a manufacturer who had started running discounts at the end of each quarter to drive additional volume. It was successful so the company continued the practice for two years before they realized a problem.  Customers became trained to hold their orders until the end of quarter.  Margins dropped significantly so the discounts were stopped.  Competitors continued to offer their discount programs and customers were lost.  I was told by a Macy’s store manager in the early 2000’s that they’d trained their customers to wait for the sale and they didn’t know how to reverse that trend without losing customers.  It’s a problem that spans across markets.

This goes back to the first big data insight.  Customers pay whatever a company can convince them to up to a budgetary maximum.  With discounting a business is convincing a customer to pay a lower price.  Tell a customer that a $39.99 product is on sale for $24.99 enough times and they believe the product is only worth $24.99.  That leads to the fourth big data insight.  Only discount when it adds to the customer’s engagement with the brand.

Starbucks is a good example of this. Through their loyalty program customers buy a certain number of drinks and then get one free.  Rather than sending the message of this drink is worth $0, it says, “Thanks for your business. Here’s how much we appreciated it.”  Apple is another good example with the iPhone.  When a new model comes out, the old model is discounted.  That opens the older model to a new market, driving volume but it tells customers with the old phone something too.  Your phone is not as valuable as the new one.  That perceived loss of prestige drives upgrades in a couple of their customer segments.

We’ve come full circle, returning to the deep dive into customer decision making. If we were rational decision makers, utility would reign supreme over our decision making process.  That would put price at the forefront of the process.  However we’re not rational and our perceptions, beliefs and biases play heavily into our decisions.  A strong brand connection plays more into the equation than pricing.

Big Data Can Do a Lot More For Pricing

Big data is most effective when insights are layered and they begin to reveal patterns that were previously unknown to the business. Once it’s been revealed, pricing strategy’s role in maximizing CLV and enabling brand engagement is better understood.  The pitfalls of discounting are easier to avoid.  The focus can shift from fairly obvious insights to discovering new patterns in customer segments.  This is where an algorithmic approach breaks down and heuristics become a lot more successful.

Heuristics allow new patterns to be recognized by the machine. The first four insights sound like common sense because intuitively we’re able to come to the conclusions ourselves through experience and anecdotes.  Those types of insights are what algorithms are able to reveal.  Sift through enough datasets and some basic patterns become obvious.  There are other patterns in the data that aren’t so obvious.  Detecting those require heuristic methods that are able to detect subtle patterns in very large datasets.

The goal moves from categorizing customers after a few interactions to categorizing the customer during their first; from broad categories to increasingly granular ones. These categorizations when combined with our four basic insights about pricing allow for real time pricing strategies that are effective across multiple channels.  Point strategies maximize margins while also enabling loyalty and repeat purchases.  Heuristics allow these pricing strategies to be personalized by customer category with increasing granularity.

Longer term strategies can also be created. Customers change over time.  Businesses grow or shrink and people become more/less affluent or sophisticated.  From a CLV perspective it’s inefficient to allow customers to leave a brand because of these changes.  Tiered pricing and product lines are one method big data reveals is effective in preventing these types of departures.  Toyota doesn’t want to lose high end customers because they don’t have high end cars so they created the Lexus line.  Nissan has Infinity and Honda has Acura with the same goal.  Mercedes wants to expand its reach to younger buyers and has introduced new car lines to accomplish this goal.  Those buyers now begin the loyalty cycle earlier and CLVs grow.  The combination of brand loyalty and tiered product/pricing strategies also becomes more granular, again allowing for greater personalization.

This increase in personalization goes a long way towards creating that connection between customer and brand that I discussed earlier. The goal of big data enabled pricing is increasing levels of personalization that drive increasing levels of brand loyalty and higher CLVs.  As the business’s proficiency with big data and advanced analytics grows, the categorizations become faster, more accurate and more granular.

The Bottom Line: Why Use Big Data For Pricing Strategy?

There are two drivers for big data and heuristic enabled pricing: customer preferences and competitive pressures. As customer loyalty systems become more prevalent and increasingly sophisticated, customers are beginning to expect higher levels of personalization.  They’re expecting pricing to match their levels of loyalty.  Progressive recognized this trend a few years ago.  Their loyalty system gives pricing and perks to new customers based on their loyalty to their last insurance company.  The message is clear, “If your last company didn’t appreciate your loyalty come to us and we will from day 1.”

This is the second driver for data enabled pricing. Competitors are luring customers away with data enabled pricing strategies.  This will drive even the staunchest holdouts to adopt the methodology.  Those businesses that don’t will find it difficult to compete in the next two to three years.  The bottom line is big data enabled pricing is a matter of business differentiation in the short term and business survival in the long run.