Don't Write Evals for Fast-Moving Systems

25 Jan, 2026 - 03 Mins read

Evals
LLMs

When developing Clobsidian, an obvious question came up: as an evals person (Head of AI even, to hear me tell it), should I be evaluating the capabilities more than I am? Which is to say, at all?

The answer is: not yet, but I know when I’ll start.

Scope

The title is a bit of a clickbait. Here’s the checklist to see if this applies to your system:

You’re developing an LLM-powered system.
It’s moving fast.
You’re the only user.
You’re the only one who can fix the errors.
You’re the only one who knows what success looks like.

Example: the `draft-doctor` skill

I have a skill that helps me develop half-baked ideas that I’ve written down but haven’t developed into a full draft. (It’s emphatically not a slop maker — rather, it’s a tool to help me get unstuck when I’m stuck, which mostly involves a lot of questions and some brainstorming.)

An excerpt from the skill:

Phase 2: Research Context

Before asking questions, gather context:

URLs in draft: Use WebFetch to read linked articles/pages

Previous posts: Check 02-Areas/Drafts/Posted/ for related content

People/concepts: Note anything that might inform your questions

Phase 3: Ask Clarifying Questions (Socratic, Not Strategic)

Use AskUserQuestion to help the author think through the subject matter itself — not meta questions about format, audience, or packaging.

Ask questions that:

Probe the core claim (“What exactly bothers you about this?”)

Surface hidden assumptions (“Is there a case where this would be fine?”)

Find the personal angle (“Have you done this yourself?”)

Sharpen the argument (“What’s the strongest counterargument?”)

Uncover the real insight (“What do you know that most people don’t?”)

Don’t ask:

“Who is this for?” (meta)

“What tone do you want?” (meta)

“Should this be short or long?” (meta)

“What do you want readers to do?” (meta)

Guidelines:

Ask 1-2 questions at a time

Ask about the IDEAS, not the presentation

Follow the thread — let their answers reveal what they actually think

Stop when THEY have clarity (you’ll see it in their answers)

Match the language of the draft (English or Czech)

Even in this partial excerpt, there’s already quite a bit that can go wrong.

The questions could be irrelevant.
Early questions could overdetermine the later ones.
The questions could be too narrow or too broad.

Should I have written evals for this skill, then? No. Not yet.

The reality: fix on fail, commit, move on

In reality, I’ll test the skill with myself as the first user. If it fails, I’ll fix it and commit the fix. If it works, I’ll commit the change with an explanation of why I made it, and move on to the next issue.

Okay, Simon, but surely you’re at least logging the errors? Annotating them, perhaps? Not if I’m the only user, and not if I’m fixing the errors myself right away.

If the error never recurs, this will be the last time I think about it.

Trigger 1: Regression

If the error comes back, though? That’s when an eval is a good idea.

It indicates that I didn’t initially know how to fix the error. Alternately, it indicates that the error is in some way sticky, and will recur if allowed.

Trigger 2: Onboarding

The other reason to write evals is to define the standards for success for others. Because if they don’t know what success looks like, they can’t fix the failure when they encounter it.

Trigger commonality

In both cases, you’re removing yourself from the critical path. That’s a necessity… at some point. But if your heart and soul are still in the system, there’s no need to remove yourself just yet.

What this means for instrumentation

Note that I’m saying you shouldn’t write evals too early. You absolutely should be logging what’s going on in the system, though.

(This is a bit tricky in a Claude Agent-ish setup — the simplest way that I’ve found is to have a separate sub-skill that’s explicitly invoked by the main skill.)

Conclusion

Evals have a cost. Pay it when it’s worth it.

Feel free to contact me

Have a question, an idea, or just want to say hello?

Contact Me