You want to diagnose with data and treat with design. Data is not a tool that's going to tell you what you should build.
Data reveals problems, design creates solutions
Discovery → Problem Identification
You should start with some kind of data analysis to ground what you should even test, and that's a little bit different than software engineering where you have a lot more expectations of how the system is going to work. With LLMs, it's a lot more surface area. It's very stochastic, so you kind of have a different flavor here.
Just write down the first thing that you see that's wrong, the most upstream error. Don't worry about all the errors, just capture the first thing that you see that's wrong, and stop, and move on.
In this context, "traces" refers to conversation logs between AI assistants and users that are being analyzed for errors and improvement opportunities.
Keep looking at traces until you feel like you're not learning anything new. There's actually a term in data analysis and qualitative analysis called theoretical saturation.
More from Julie Zhuo:
Also in Problem Identification: