VOL. III · 2026 Šimon prague + london · est. 2012
Menu
LLMs 02 Mins read

Voice is cheap, knowledge is expensive

Notes from building a personal twin: four Gemma fine-tunes, one good system prompt, and the architecture that actually shipped.

  • RAG
  • fine-tuning

I wanted my website to have a chat that answers in my voice. I wanted an excuse to fuck around with fine-tuning and non-vector RAG, and giving my website a twin was the perfect opportunity.

tl;dr: I couldn’t make the fine-tune work better than the Haiku-with-RAG approach. The work was its own reward, though.

Try the chat first

This is a long writeup. The fastest way to feel the punchline is to ask the same question of both backends and watch what happens. Try one of the probes below — What is Clobsidian? is a project that I’ve described on my blog, Constitutional MBTI is a one-off Linkedin post referring to a project the fine-tune confidently invents around, and Define an eval in one sentence is small enough that voice is the whole point. The Claude side streams; the Gemma side has to wait for the GPU to wake up the first time.

Haiku 4.5anthropic api · prompted, with retrieval

Pick a probe below, or type one of your own.

Gemma 4 E2B + LoRAmodal · L4 GPU · cold start ~60s

Pick a probe below, or type one of your own.

(**Do let me know how you’ve managed to jailbreak either side! **I’ll add it to the post.)

The rest of this post is the trace, the negative results, and what I think this says about personal AI for sites of this size.

Fine-tuning 101

There’s a couple of methods for fine-tuning. We used the Unsloth implementation of LoRA (low-rank adapter) as part of the TRL framework. (There’s other methods — DPO, PPO, etc. — but we’re sticking to LoRA for this post.)

The basic idea of a fine-tune, though, is that you’ll use 100+ examples of your own writing, and train a model on it. The model will then be able to answer questions about your writing in your writing’s voice.

Why not RAG from the start?

Mostly because I already know how to make a RAG system :)

I also didn’t need knowledge attribution, and I thought a fine-tune would pick up the same knowledge even if it didn’t make a search at inference time. I was a bit wrong on that one, at least for this training corpus.

The Adventures in Fine-tuning

The initial setup

Four follow-up attempts

The Haiku comparison

RAG to the Rescue

Liked this? More writing ↗ · Talk to me about it ↗