The Eval Flywheel: From "Works on My Laptop" to Systematic Quality
Most teams shipping GenAI products have no evaluation system. They have vibes, a few saved prompts, and hope. This talk starts with observability: if you can't see what's happening in production, nothing else matters. From there, we build the eval flywheel — a practical pattern where production observability feeds error analysis, error analysis generates eval cases, and eval cases prevent recurrence.