LLMs08 May, 202602 Mins read

Voice is cheap, knowledge is expensive

Notes from building a personal twin: four Gemma fine-tunes, one good system prompt, and the architecture that actually shipped.

RAG
fine-tuning

~~I wanted my website to have a chat that answers in my voice.~~ I wanted an excuse to fuck around with fine-tuning and non-vector RAG, and giving my website a twin was the perfect opportunity.

tl;dr: I couldn’t make the fine-tune work better than the Haiku-with-RAG approach. The work was its own reward, though.

Try the chat first

This is a long writeup. The fastest way to feel the punchline is to ask the same question of both backends and watch what happens. Try one of the probes below — What is Clobsidian? is a project that I’ve described on my blog, Constitutional MBTI is a one-off Linkedin post referring to a project the fine-tune confidently invents around, and Define an eval in one sentence is small enough that voice is the whole point. The Claude side streams; the Gemma side has to wait for the GPU to wake up the first time.

Haiku 4.5anthropic api · prompted, with retrieval

Pick a probe below, or type one of your own.

Gemma 4 E2B + LoRAmodal · L4 GPU · cold start ~60s

Pick a probe below, or type one of your own.

(**Do let me know how you’ve managed to jailbreak either side! **I’ll add it to the post.)

The rest of this post is the trace, the negative results, and what I think this says about personal AI for sites of this size.

Fine-tuning 101

There’s a couple of methods for fine-tuning. We used the Unsloth implementation of LoRA (low-rank adapter) as part of the TRL framework. (There’s other methods — DPO, PPO, etc. — but we’re sticking to LoRA for this post.)

The basic idea of a fine-tune, though, is that you’ll use 100+ examples of your own writing, and train a model on it. The model will then be able to answer questions about your writing in your writing’s voice.

Why not RAG from the start?

Mostly because I already know how to make a RAG system :)

I also didn’t need knowledge attribution, and I thought a fine-tune would pick up the same knowledge even if it didn’t make a search at inference time. I was a bit wrong on that one, at least for this training corpus.

The Adventures in Fine-tuning

The initial setup

Four follow-up attempts

The Haiku comparison

RAG to the Rescue

You're developing an LLM-powered system. It's moving fast. Should you write evals? Not yet.read ↗

Clobsidian in Detail: Cross-Source Personal InfrastructureLLMs

Here's the Obsidian/Claude Code setup in more detail, including the data sources and the skills I built.read ↗

Clobsidian, and other winter experiments with Claude CodeLLMs

I've been using Claude Code for non-code things. Here are some of the experiments I've run.read ↗

¶Liked this?More writing ↗·Talk to me about it ↗