Lenny Distilled

The ultimate guide to A/B testing

July 27, 2023

Featuring: Ronny Kohavi (Consultant, Former VP at Airbnb, Microsoft, Amazon)

11 quotes · 7 insights

Watch Full Episode

Portfolio your bets: 70% core, 20% adjacent, 10% moonshots

You have to allocate sometimes to these high risk, high reward ideas. We're going to try something that's most likely to fail. But if it does win, it's going to be a home run. And you have to be ready to understand and agree that most will fail.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 00:00:22

The best experiments surprise you

A surprising experiment is one where the estimated result beforehand and the actual result differ by a lot. So that absolute value of the difference is large.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 00:17:32
Twyman's law, the general statement is if any figure that looks interesting or different is usually wrong. If the result looks too good to be true, your normal movement of an experiment is under 1% and you suddenly have a 10% movement, hold the celebratory dinner.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 01:00:51

A P value is a statistical measure used in A/B testing to determine if experimental results are statistically significant, commonly set at 0.05 (5%).

Many people assign one minus P value as the probability that your treatment is better than control. That is wrong.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 01:02:30

The "success rate" refers to the historical percentage of A/B tests at Airbnb that showed positive results.

At Airbnb, where the success rate is only 8%, if you get a statistically significant result with a P value less than 0.05, there is a 26% chance that this is a false positive result. It's not 5%, it's 26%.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 64:54

Growth needs culture, not just experiments

Unless you have at least tens of thousands of users, the math, the statistics just don't work out for most of the metrics that you're interested in. Start experimenting when you're in the tens of thousands of users. Below that, start building the culture, start building the platform, start integrating.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 00:26:42

Short-term wins often disappear long-term

OEC stands for "Overall Evaluation Criterion," Kohavi's framework for defining metrics that A/B tests should optimize for.

To me, the key word is lifetime value, which is you have to define the OEC such that it is causally predictive of the lifetime value of the user.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 00:32:05

Most experiments fail - and that's exactly the point

Context: Kohavi is referring to A/B testing experiments at tech companies, where "failure" means the experiment didn't improve the target metric.

At Microsoft, about 66%, two thirds of ideas fail. At Bing, which is a much more optimized domain after we've been optimizing it for a while, the failure rate was around 85%. And then at Airbnb, this 92% number is the highest failure rate that I've observed.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 14:09

Test everything, but test the right version

I'm very clear that I'm a big fan of test everything, which is any code change that you make, any feature that you introduce has to be in some experiment. Because again, I've observed this sort of surprising result that even small bug fixes, even small changes can sometimes have surprising, unexpected impact.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 00:00:00
Find a place, find a team where experimentation is easy to run. Don't go with the team that launches every six months, or Office used to launch every three years. Go with the team that launches frequently.
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 01:08:34

Time constraints force better decisions than endless planning

When you think about return on investment, we could get the data by having some engineers spend a couple of hours implementing it. And that's exactly what happened. Somebody at Bing who kept seeing this in the backlog and said, 'My God, we're spending too much time discussing it. I could just implement it.'
Ronny KohaviConsultant, Former VP at Airbnb, Microsoft, Amazon 06:19

The Missing Stamp

Every episode of Lenny's Podcast, distilled into the insights that matter and the quotes that make them stick.

LENNY WAS HERE__STAMP_DATE__

Lenny, if you're reading this, the stamp's ready when you are. 🧡🔥