Product design must match your model's accuracy

Execution → Technical Tradeoffs

Defining

There's something fundamentally interesting about that makes life fun here. If the model gets it right 60% of the time, you build a very different product than if the model gets it right 95% of the time versus if the model gets it right 99.5% of the time.

Kevin WeilOpenAI's CPO on how AI changes must-have skills, moats, coding, startup playbooks, more

Watch at 00:17:10

Defining

The quality of your machine learning, if you're going to have a single play button, needs to be literally 100% or zero prediction error, and that's never the case. So let's say that you have a one in five hits, four out of five things are done, then you need a UI that probably at least shows five things at the same time on screen. So you have a one in five of something being relevant on screen.

Gustav SöderströmThe science of product, big bets, and how AI is impacting the future of music

Watch at 00:18:09

Supporting

You're asking the judge to do one thing, evaluate one failure mode, so the scope of the problem is very small and the output of this LLM judge is pass or fail. So it is a very, very tightly scoped thing that LLM judges are very capable of doing very reliably.

Hamel Husain & Shreya ShankarWhy AI evals are the hottest new skill for product builders

Watch at 00:50:49

Product design must match your model's accuracy

Add to Home Screen

The Missing Stamp