There's actually more code in the jobs that serve the Stripe API to handle those edge cases than in the actual main flow. And I think that's quite remarkable. Most people wouldn't do that, but it turns out not only was it something I was impressed with, but when I talk to Stripe users, this is very frequently something they tell me and delights them about the product.
Error handling deserves more code than the happy path
Execution → Technical Tradeoffs
The real world has entropy and it's hard and it's messy... Computers are deterministic, but humans aren't, right? And so building products that have a little bit more flex or a little bit more fail safes in case those things happen becomes a little bit more of a paramount.
We don't have a bug backlog. We fix every bug once they're surfaced almost. So it's part of the production engineer's job really just to fix those things.
We choose to design the way we work to hold those two things true at the same time. So we can operate very rapidly but also be extremely reliable and available for our users. It does take a lot of care and attention and it takes a lot of systems.
More from David Singleton: