When (& How) to Start Writing Evals
Most teams approach LLM evaluation like test-driven development: write tests first, then build. But LLMs have an infinite failure surface — you can't predict what will break. This talk argues for a different approach: deploy first, observe failures, then build evals for the patterns you've actually discovered.
Feel free to contact me
Have a question, an idea, or just want to say hello?