Lenny Distilled

Iteration beats perfection

Craft → Execution Sense

With caveats

"This step" refers to analyzing and categorizing actual AI system failures before building tests, and "evals" means automated evaluations or tests for AI systems.

You don't want to skip this step. The reason I'm kind of spending so much time on this is this is where people get lost. They go straight into evals like, 'Let me just write some tests,' and that is where things go off the rails.
With caveats

LLM judges are AI models used to automatically evaluate other AI outputs, and "evals" refers to these automated evaluation systems.

Before you release your LLM as a judge, you want to make sure it's aligned to the human. A lot of people stop there and they say, 'Okay, I have my judge prompt. We're done.' Don't do that, because that's the fastest way that you can have evals that don't match what's going on, and when people lose trust in your evals, they lose trust in you.

The Missing Stamp

Every episode of Lenny's Podcast, distilled into the insights that matter and the quotes that make them stick.

LENNY WAS HERE__STAMP_DATE__

Lenny, if you're reading this, the stamp's ready when you are. 🧡🔥