My agent bill tripled — here's what fixed it

A field note on an AI coding bill that tripled in a month, the culprit, and the two changes that fixed it.

Stuart LeoJune 8, 20263 min read

My AI coding bill tripled in a single month, and my first assumption was the obvious wrong one: agents are just expensive now, suck it up. They're not. When I actually looked, the cost was two specific habits of mine — both easy to fix. This is the field note.

The bill that tripled

Nothing about my usage felt three times heavier. Same kind of work, maybe a bit more of it. But the invoice had tripled, and "agents are expensive" was the lazy story I told myself for about a day before it started bothering me enough to investigate.

Hunting the cost (it was context re-sending)

When I broke the usage down, the shape was stark: the spend wasn't in the agent's output. It was in the input — enormous token counts going into the model on every call, growing through each session. That's when context re-sending clicked for me. The model is stateless between calls, so every step in the loop re-sends the whole history. My sessions had got long and sprawling, so by late in a session I was shipping a huge context on every single call — paying again and again to remind the agent of things it had already seen.

Stuart Leo

I wasn't paying for answers. I was paying, hundreds of times a session, to re-send the same bloated context.

The second culprit: top model for everything

The other half was lazier still. I'd set everything to the most powerful model and forgotten about it. Every trivial step — reading a file, listing a directory, a one-line edit — was billing at premium rates it never needed. Most of an agent's loop is easy work, and I was paying top dollar for all of it.

Compaction plus model routing

Two changes, and the bill came back down hard:

Compaction. I stopped letting sessions sprawl. Now I compact when the window gets long — digest where things are up to, drop the raw transcript, start fresh from the note. Each call carries far less, so each call costs far less. This alone did most of the work, and it made the agent sharper as a bonus, because a lean window doesn't rot.
Model routing. I stopped running everything on the top model. Cheap, fast model for discovery and simple edits — the expensive one only for genuinely hard reasoning. Same outcomes, a fraction of the cost.

What it costs now

The bill dropped back below where it started — on more work than before. Nothing about my output got worse. I just stopped paying to re-send bloated context and stopped paying premium rates for trivial steps.

My bill wasn't high because agents are expensive — it was bloated context and lazy model choice. Both were mine to fix.

Start here: see why agents get expensive, how to route models to cut cost, or read the method.

Why AI agents get expensive (and how to fix it)

Agentic tasks burn far more tokens than chat — mostly from re-sending context every call. Why the bill climbs, and how leaner context brings it down.

Route models to cut your AI agent bill

Running everything on the top model is the easiest way to overpay. Route by task — cheap models for discovery, strong ones for hard reasoning — and cut the bill.

Compaction: keeping your agent's context window lean

Every model degrades as its context fills — context rot. Compaction keeps the working set lean without losing what matters. The three mechanisms C² uses.

All articles