Machine Learning for Product Managers

Machine learning offers huge potential across digital products but it continues to come with so much hype that it leaves us with more questions than answers. What new thing can we build we couldn’t before? How do we introduce intelligence into existing products? How much data do we really need?

In this talk Mat and Matt given an overview of practical concerns in building machine learning powered products through a set of standard product management lenses including customer value, commercial viability, technical feasibility and end usability.

They step back and consider the strategic implications of Machine Learning and the potential to build sustainable competitive advantage, before diving into the practicalities of establishing ML product teams.

Machine Learning for Product Managers

Matt Travers & Matt Kelcey – Product Principal and Principal, Machine Learning, ThoughtWorks

Keywords: Machine Learning, User Experience, value creation, efficiency, commodity models, bespoke models, UI, data estate, data complexity.

TL;DR: The Matts explore Machine Learning through their respective lenses of technical expertise and product management in an attempt to map the overlap between the two and what each viewpoint might bring to create a fruitful collaboration. Matt K explores what machine learning is and its current technical feasibility in terms of types of models and their relevant merits as well as correlations between levels of complexity and volume of data, particularly as these impact delivery and the teams who drive delivery. Matt T explains how and why good product managers need to ensure that customer value, usability and strategy are as much a part of the process as the technical aspects, and uses a risk/benefit analysis to compare various approaches and chart potential pitfalls vs proven success strategies.

[Matt T speaking:]The genesis of this talk is that Matt T and Matt K have very different perspectives on machine learning. Matt K comes from a very technical background, whereas Matt T’s background is in >strong>product. The majority of broader conversations about machine learning comes from these two perspectives, but there’s not much in the middle. It’s either deep technical conversations, or high-level strategic discussions about how machine learning is going to change everything. The Matts wanted to do a talk that looks at the overlap about what we can actually do today with machine learning with our current product.

What we’ll cover today: A look at ML through six product management lenses.

  • What is Machine Learning?
  • Technical Feasibility
  • Customer Value
  • Commercial Viability
  • Usability
  • Strategy
  • Delivery

Over to Matt K: What is machine learning? MK: There are many hype related terms out there around ML: Is it artificial intelligence? Data Science? Artificial intelligence is a broad, almost philosophical concept and data science is a collection of roles in a company. This presentation is about machine learning and particularly a subset of machine learning that is really applicable to business. Mat sees machine learning like a toolbox. We have a collection of problems that we know how to solve given certain data sets. When presented with a problem, they reach into the toolbox to find a solution.

We’ll start with an example to ground the conversation. Let’s imagine we want to build a piece of machinery that predicts the price of a house. Two parts to this would be: The robot in the middle representative of the software we want to write. Let’s call that a model. This model tries to map between things we might know about a house ex: location, materials, age and try to predict from these what a sale price might be. The question becomes: how do we do this? How do we build this software? In many machine learning use-cases, we use a technique called Supervised machine learning. So named because we train the model by giving it examples of things that we know the answer for. We collect a priori info about a set of houses, The problem is not to learn or specify the mappingbetween these things, but to give examples to a piece of machinery in order that it might learn for itself. This whole premise is conditioned by the fact that we have access to this data. Different data results in a different model. This is a critical insight – much of the software we write can be deterministic, but the dependency of machine learning off a particular data set is critical to success and how things work internally.

Technical Feasibility. MK: Let’s begin by exploring some technical aspects like data sets and models. A common set of axes to begin when thinking about ML is to look at what sort of data we have and how much of it. How complex is your model? What capacity does it have? Put another way: How big is the piece of machinery’s brain? This is somewhat intuitive, but in general, there is relationship between the amount of data you have on the problem and the quality of your results. This matters because we don’t ever build one piece of machinery. Not a matter of installing the machine learning model and walking away, but rather an iterative process. As we collect more data or more complex data we have the opportunity to rebuild the model in some way and ask from it more complicated questions.

More data = better results. There are two dimensions at play here; volume and complexity. Ex: Returning to the house price model, one conception of more data is simply more rows (volume). More types of material, distance, and age. Again this feels intuitive: the more examples the better the results. The other way we can glean more data is by adding columns (complexity). More categories of differentiation can be added, ex: how many bedrooms does the house have? Does it have a pool? Results are impacted by not only how many houses you compare but also what kinds of information you have on each house.

On the data side, what’s more interesting in terms of the relationship to product is the complexity of the model. This is often driven by the complexity of the answer we are seeking. It may be that the question you are asking can’t be answered from the complexity of this data set. This offers lots of flexibility on the RHS/Output side to ask different questions of the data. Maybe this data (on the house) is helpful in answering: How many days will it take to sell this house? Or: Will this house sell in the next month?The less complex the question the less data you need to build a model. In terms of products for machine learning, it’s important to think about: Can we, if needed, reframe our question? Is there a simpler version of the question we are trying to ask? that gives us a better result. There is a trade-off between volume and complexity of data and the complexity of the questions you can ask.

It’s also critical to understand there is never just a single model or one piece of software for machine learning. They end up with lots and lots of models which interact in very complicated ways. Never just one input and out output. Might have input which goes to a model which becomes output, which then becomes input for another model. All the models are doing different things and have different responsibilities.

In thinking around different models and data requirements, there is a continuum of: Where does this model come from? Until fairly recently, you had to build your model from scratch. But as common questions are asked of common data, models have become a commodity. Cloud services and existing models provide a commodity set of capability but don’t differentiate across the market. If you have unique data or a unique question to ask of that data, you may need to build something more bespoke.This gives more flexibility, but comes at a non-trivial engineering cost. You can also use a combination, taking a commodity model and customizing it.

Choices between commodity or bespoke models are further complicated by how rapidly the technology is evolving. Ex: Object recognition technology. Very common and simple to do now. Ten years ago it was inconceivable. Five years ago it was a complicated process. Now, it costs a fraction of a cent and is easily accessible via the cloud. Interesting balance between a relatively new technology that anyone can integrate, but how do you integrate and differentiate relative to your competitors? These are questions to ask when thinking about what kind of model you need to build/use.

Customer Value. [Back to Matt T presenting]. Returning to the matrix between volume and complexity, let’s add another bubble to represent customer value. A basic tenet of product management is that PMs should always be thinking about the value they are delivering to customers. But this sometimes gets lost in the excitement around machine learning. As we increase the size and complexity of the data estate and develop the model capacity, we need to be looking for an increase in customer value that matches that investment. It’s important to set a baseline from the outset. Before even establishing a machine learning experience, understand what it would look like if it were a non-ML one. Ex: If using a recommender, use a simple rules-based system: Products in the same category at around the same price point. This gives you a baseline for measuring customer value. Often there will be a sweet spot between adding to the data estate and improving model complexity. Too much data that isn’t relevant or add value or over-investment in the model without a customer value correlate takes you away from this sweet spot.

Anti-patterns to be aware of when trying to develop customer value in line with investment in the technology:

  • The stalled value trap: You’re continuing to invest and moving across and upwards, but seeing no increase in value. At this point you should stop and solve another problem.
  • The hype trap: This is common when CEOs want to see ML but overestimate its utility. Simple rules-based systems already deliver high value and a major ML investment is required to match customer value.
  • The moonshot: Not so much an antipattern as it can be a valid choice, but there is no payoff until substantial investment (see: autonomous driving). Value relies on complex models and high volume and complexity of data. This strategy is high risk, with no feedback on progressing customer value. Assess whether it is right for your organization.

What the Matts do advocate is a portfolio of complementary approaches. This is the most reliable path. Just as you would have a product portfolio for a set of products, it makes sense to have a set of different investment approaches.

Commercial Viability. Let’s stick with the data estate and ML model capacity axes as frames for understanding how to manage costs. Two major costs and emergent patterns to explore here (NB: Your mileage may vary!): The data cost structure around the x-axis (the data estate) is comprised of the cost of data engineers and the cost of data hardware and storage. Storage is cheap in early stages but the cost grows as complexity and volume increase. The bigger cost is the people cost in terms of data engineers. Often company data is not organized or accessible and you often need to put in a big investment up front to get it into a position that is usable.

On the ML Model Capacity axis, compute costs can be significant but typically have gradual linear growth. Data scientists are expensive and can be explosively so during the transition from generic to custom models. Something original is high risk and high cost.

Usability. Usability depends on the application you’re building, however there is a pattern that applies to most ML-based products. Having an interface which is robust to failure. There is a challenge to developing the machine learning experience up the two axes. It’s important that the UI you create reflects where you are. Ex: Youtube doesn’t aim to solve the user problem of: What’s the next video I should watch? But rather, augments that experience. Gives a selection of videos that might be relevant and mixes various alternatives so that the user can choose from the augmented experience what option they want to pursue.

There is an important difference between augmenting and automating. As we get more confident with the ML Experience, we can be more presumptuous in terms of the interface that we build. Ex: The autoplay feature in Youtube automates rather than augments, confidently assumes you will want to see what it plays next.

Strategy. One thing that makes ML exciting from a strategic point of view is the double flywheel effect. Many products show the revenue-generating flywheel of: Creating a good experience – drives usage – generates revenue – improves experience – attracts more usage. With ML, another layer is added through the unique data that is generated through usage. Usage – generates unique data (volume and complexity) – fed in to improve experience – attracts more usage. Strategically, getting there first with a machine learning product can create an unassailable competitive advantage.

Two main categories of ML opportunity: Value creation vs efficiency. Value creation – building a new product experience through ML technology. This creates new customer value which can be monetized (ex: voice services, self driving vehicles). The flywheel potential means there is utility in getting there first which creates some urgency, but this is also a risky strategy. Alternatively, we can look for value opportunities around efficiency, which can reduce costs and mitigate risks (ex: process prioritization, fraud detection). More incremental progress but also more proven as a road to success.

Delivery. [Back to Matt K.] There are a couple of patterns we see across companies both small and large which follow two axes. The horizontal axis characterizes ML in statistical terms: being descriptive, predictive, or prescriptive. Descriptive applies to things like dashboards: a large amount of data that you want to summarize in ways that can aid humans in making decisions; the distillation of complicated data. People often move from this model to a predictive model. You want to change some levers upstream in your system, change things that are happening in future. Thirdly, prescriptive, where you effectively hand over some aspect of the business to act on your behalf. These are the three core ways that complexity grows.

On the Y axis of complexity, we look at who is doing this. Quite common due to the technical challenges to have a single team working on the problem (‘centre of excellence’). They may then grow into a cross-team or in very high tech companies the mandate may be to do this company wide. Three vectors of note. Descriptive statistics lead to many teams doing simple things,which is inefficient. Conversely, a single team doing very complex things can silo the project and is not ideal. The ideal is a progression over time and capturing expertise not necessarily in people, but in tools and process.