Voice is cheap, knowledge is expensive
Notes from building a personal twin: four Gemma fine-tunes, one good system prompt, and the architecture that actually shipped.
I wanted my website to have a chat that answers in my voice. I wanted an excuse to fuck around with fine-tuning and non-vector RAG, and giving my website a twin was the perfect opportunity.
tl;dr: I couldn’t make the fine-tune work better than the Haiku-with-RAG approach. The work was its own reward, though.
Try the chat first
This is a long writeup. The fastest way to feel the punchline is to ask the same question of both backends and watch what happens. Try one of the probes below — What is Clobsidian? is a project that I’ve described on my blog, Constitutional MBTI is a one-off Linkedin post referring to a project the fine-tune confidently invents around, and Define an eval in one sentence is small enough that voice is the whole point. The Claude side streams; the Gemma side has to wait for the GPU to wake up the first time.
Pick a probe below, or type one of your own.
Pick a probe below, or type one of your own.
(**Do let me know how you’ve managed to jailbreak either side! **I’ll add it to the post.)
The rest of this post is the trace, the negative results, and what I think this says about personal AI for sites of this size.
Fine-tuning 101
There’s a couple of methods for fine-tuning. We used the Unsloth implementation of LoRA (low-rank adapter) as part of the TRL framework. (There’s other methods — DPO, PPO, etc. — but we’re sticking to LoRA for this post.)
The basic idea of a fine-tune, though, is that you’ll use 100+ examples of your own writing, and train a model on it. The model will then be able to answer questions about your writing in your writing’s voice.
Why not RAG from the start?
Mostly because I already know how to make a RAG system :)
I also didn’t need knowledge attribution, and I thought a fine-tune would pick up the same knowledge even if it didn’t make a search at inference time. I was a bit wrong on that one, at least for this training corpus.
The Adventures in Fine-tuning
The initial setup
Four follow-up attempts
The Haiku comparison
RAG to the Rescue
¶ Liked this? More writing ↗ · Talk to me about it ↗