Evals, Benchmarks, and Guardrails: A Pythonista's Guide to Not Mixing Them Up
"I'll just write pytest tests for my LLM"—but should you? This talk untangles benchmarks, evals, and guardrails: three concepts that sound similar but map to different Python patterns. Learn why pytest CAN work for evals (with the right mindset), why guardrails aren't tests at all, and a grounded theory approach to defining what "good" actually means for your task.
Feel free to contact me
Have a question, an idea, or just want to say hello?