03 Feb, 2026Evals.cz Meetup #1 (Prague, CZ)

When (& How) to Start Writing Evals

Most teams approach LLM evaluation like test-driven development: write tests first, then build. But LLMs have an infinite failure surface — you can't predict what will break. This talk argues for a different approach: deploy first, observe failures, then build evals for the patterns you've actually discovered.

"RAG is dead" is the take in every other thread in 2026 — and it's wrong: naive retrieval-augmented generation is still a sensible default, beaten only in some cases, and measurement is the only way to know if you're one of them. This talk walks the retrieval pipeline end to end, then turns to the part that matters — telling whether your RAG actually works, with ground truth, retrieval metrics, RAGAS, LLM-as-judge, and error analysis feeding an eval flywheel.view ↗

Choose Your Ground Truth: A Field Guide to Synthetic Data for EvalsEvals.cz Meetup #3 (Prague, CZ)

You need an eval set but don't have a hundred real production failures to build it from, so you reach for synthetic data — and most first attempts quietly produce garbage. A field guide to the techniques that actually work, from real-incident seeds to personas to RAG-grounded generation, with one throughline: synthetic data needs its own eval, so choose your technique backwards from the eval you want.view ↗

Financial modelling in OpenClaw & safely deploying itOpenClaw Demo Night (Prague, CZ)

Using OpenClaw to build a Bayesian buy-vs-rent model for Prague real estate, and how to deploy something like that without setting your money on fire.view ↗

Cognitive Exhaust Fumes: What Read-Only AI Sees That You Can'tai.engineer/europe (Online)

What happens when AI systems passively observe information without modifying it? Exploring the patterns and insights that read-only AI reveals — the cognitive byproducts humans overlook.view ↗

¶Want this talk for your audience?Invite me to speak ↗