Why are AI coding agents so expensive?

Because agentic tasks run a long loop of many model calls, and every call re-sends the whole conversation so far. By deep into a session you're sending 100K-200K tokens per call. Agentic work can consume orders of magnitude more tokens than a single chat for this reason — the cost is the re-sent context, not the answer.

What's the biggest driver of AI agent cost?

Context re-sending. Each step in the agent's loop includes the full history again, so a long session pays for the same context over and over. Keeping the working context lean — compaction — attacks the cost at its root.

How do I lower my AI agent bill?

Two moves do most of the work: keep context lean so each call carries less (compaction), and route tasks to the right-sized model instead of running everything on the most expensive one. Together they cut cost without cutting capability.

ArticlesFoundations

Why AI agents get expensive (and how to fix it)

Agentic tasks burn far more tokens than chat — mostly from re-sending context every call. Why the bill climbs.

Stuart LeoJune 8, 20265 min read

The first month is cheap. Then you start really building with agents, the bill arrives, and it's a shock. Agentic coding can cost an order of magnitude more than you'd expect from the same model in a chat window — and the reason isn't what most people assume.

It's not that the model is pricey per answer. It's the shape of agentic work. Understand where the money actually goes and the fixes are obvious. Here's the breakdown.

The sticker shock

A quick chat with a model costs cents. An agent grinding through a real task can cost dollars per session, and a heavy month runs into serious money. The gap surprises people because they're pricing the answer when they should be pricing the loop.

An agent doesn't make one call. It makes dozens or hundreds — read a file, plan, edit, run a test, react, repeat. Each step is a model call. That alone multiplies the cost. But the bigger driver is hiding inside each of those calls.

Where the tokens actually go (context re-sending)

Here's the part that surprises everyone: every call in the loop re-sends the entire conversation so far. The model is stateless between calls, so to "remember" step one at step fifty, step one gets sent again — and again, every step.

So by the time an agent is deep in a task, each call carries the whole history: the files it read, the output it saw, everything. The Stanford Digital Economy Lab's analysis of how agents spend tokens makes the pattern stark — the dominant cost isn't generating new output, it's repeatedly shipping the accumulated context. By turn fifty you can be sending 100K-200K tokens per call, mostly to re-establish what the agent already knew.

Stuart Leo

You're not paying for the agent's answers. You're paying, over and over, to remind it what it already saw.

Why bigger context costs twice

This is why the "just paste everything in" instinct is so expensive. A bloated context doesn't only cost more on the call you paste it — it costs more on every subsequent call, because it's re-sent each time. Dump a huge file into the window at step three and you pay for that file again at steps four through fifty.

And it costs you twice over, because — as anyone who's hit context rot knows — a fuller window also makes the model worse. So bloated context buys you a higher bill and a dumber agent. Lean context is cheaper and sharper at the same time.

Leaner context, lower bill

This is the good news: the main cost lever is something you control. Keep the working context lean and every call in the loop carries less, so the whole session costs less. That's compaction — digest what matters, drop the raw transcript, load only what the task needs. It's the same discipline that keeps the agent sharp, and it happens to be the biggest cost saver too.

A contextbase helps here in a way that's easy to miss: because the durable knowledge lives in files the agent loads on demand, you're not re-pasting it into the chat and re-paying for it every call. The context is read when needed, not carried forever.

Routing models by task

The second lever is the model itself. Running every step on the most powerful model is the easy way to overpay, because most steps don't need it. Discovery and simple edits run fine on a cheaper, faster model — save the expensive one for genuinely hard reasoning. Matching the model to the task is the single highest-leverage cost move, and it deserves its own playbook.

Most of your agent bill is re-sent context — keep it lean and the cost falls with it. Then route the models, and it falls again.

Start here: see how to route models to cut cost, compaction, or read the method.

FAQ

Why are AI coding agents so expensive?: Because agentic tasks run a long loop of many model calls, and every call re-sends the whole conversation so far. By deep into a session you're sending 100K-200K tokens per call. Agentic work can consume orders of magnitude more tokens than a single chat for this reason — the cost is the re-sent context, not the answer.
What's the biggest driver of AI agent cost?: Context re-sending. Each step in the agent's loop includes the full history again, so a long session pays for the same context over and over. Keeping the working context lean — compaction — attacks the cost at its root.
How do I lower my AI agent bill?: Two moves do most of the work: keep context lean so each call carries less (compaction), and route tasks to the right-sized model instead of running everything on the most expensive one. Together they cut cost without cutting capability.

Route models to cut your AI agent bill

Running everything on the top model is the easiest way to overpay. Route by task — cheap models for discovery, strong ones for hard reasoning — and cut the bill.

Compaction: keeping your agent's context window lean

Every model degrades as its context fills — context rot. Compaction keeps the working set lean without losing what matters. The three mechanisms C² uses.

My agent bill tripled — here's what fixed it

A field note on an AI coding bill that tripled in a month, finding the culprit (re-sent context and the wrong model everywhere), and the two changes that fixed it.

All articles