ArticlesFoundations

Crawl, walk, run: adopting AI agents safely

Teams that jump straight to autonomous agents fail. The path that gets agents from first try to production work.

Stuart LeoJune 9, 20263 min read

The most common way teams fail with AI agents isn't picking the wrong tool. It's starting in the wrong gear — pointing agents at the hardest, highest-stakes work on day one, watching it go sideways, and concluding agents don't work. They do. You just can't start at full speed.

The pattern that works is the oldest one in adoption: crawl, walk, run. Here's what each gear looks like for AI agents, and why skipping straight to "run" is the reliable way to fail.

Why jumping to run fails

"Run" is the dream: hand the agent a big, ambiguous feature and let it go. It's also where almost every failed adoption starts, because at that level a confident mistake is expensive and hard to catch — and you have no track record yet to know where the agent is reliable.

The advice from teams that have actually scaled this is blunt: strategies that jump straight to "run" fail. Not because the agent can't eventually do hard work, but because trust, context, and guardrails are built in the lower gears — and without them, the hard work has nothing to stand on.

Crawl: tests, small fixes, low-risk refactors

Start where mistakes are cheap and instantly visible:

  • Adding tests to code that lacks them — low risk, and it builds the test coverage everything later depends on.
  • Fixing small, well-defined bugs — clear success criteria, easy to verify.
  • Low-risk refactors — mechanical changes the tests can guard.
  • Dependency updates and doc sync — tedious, bounded, easy to check.

This gear feels unambitious. That's the point. You're building two things: the test and context scaffolding that make later work safe, and your own judgement about where the agent is trustworthy. Both are prerequisites for everything above.

Stuart Leo

Crawl isn't a warm-up you skip. It's where you build the tests, context, and trust that 'run' stands on.

Walk: features behind review

Once the agent is reliably handling chores and you've got tests and a contextbase building up, step up to real features — but behind review. The agent builds, a human (or a bench agent) reviews before it lands. You're trusting the agent with more, but with a checkpoint.

This gear is where most of the value lives, and where most teams should spend most of their time. It's ambitious enough to matter, safe enough to catch the misses.

Run: autonomous, well-fenced work

Only now — with tests, context, trust, and guardrails in place — does autonomous work make sense: the agent running overnight or unattended on well-scoped tasks, behind isolation and a test gate. "Run" isn't reckless here, because everything underneath it was built in the earlier gears. The same autonomous run that wrecks a team on day one is safe on day ninety, because of what's been built in between.

Build the context as you climb

Here's the thread connecting all three gears: at every step, you're capturing what the agent learns into the contextbase. The crawl gear documents the codebase and builds test coverage. The walk gear accumulates decisions and gotchas. By the time you reach "run," the agent isn't just more trusted — it's better informed, reading a contextbase that's been compounding the whole climb.

That's why crawl-walk-run isn't just risk management. It's how the agent gets good enough for "run" in the first place.

Adopting agents isn't a leap — it's a climb from low-risk chores to real work, with context built at every step. Start in crawl, and you earn run.

Start here: see why most agent pilots never reach production, how to run agents overnight safely, or read the method.

FAQ

How should a team start adopting AI coding agents?
Crawl first. Begin with low-risk, high-verifiability work — having agents add tests, fix small bugs, do low-risk refactors, and keep docs in sync. Build trust and context there before moving to feature work behind review, and only then to more autonomous runs. Teams that jump straight to autonomous fail.
Why do teams fail at adopting AI agents?
Usually because they start at 'run' — pointing agents at high-stakes, hard-to-verify work before anyone has built the trust, context, or guardrails to support it. The work goes wrong in ways nobody catches, the team loses confidence, and the effort gets pulled. Starting small avoids that.
What work should AI agents do first?
The work where mistakes are cheap and easy to catch: adding tests, fixing small bugs, low-risk refactors, dependency updates, and documentation sync. These build the test coverage and context that make later, riskier work safe — and they build your judgement about where the agent is reliable.