James Routley

DISCLOSURE: If you buy through affiliate links, I may earn a small commission. (disclosures)

I've been back at work for about a month now after parental leave and a 12 week programming retreat at Recurse Center. We're leaning hard into agentic engineering and I'm bullish on it - I spent a lot of that time building agentic systems from skills to orchestrators to tradeoffs between speed and quality. Now I'm back up to speed at work which required rethinking some of those practices at enterprise scale and I think I've found a good balance between speed, quality, and mental overhead.

In this post I want to share what's working for me with the current state of the tools. A lot has changed since I last shared how I use agents in 2025 - the models and harnesses are better, the context windows are bigger, and I've moved to a terminal-focused dev setup to streamline my agentic engineering workflow for portability, concurrency, and friction.

The three-way tradeoff: speed, quality, and mental overhead

When people talk about coding with agents, they usually talk about speed. And agents are really, really fast at reading / writing / coding. But optimizing for speed typically comes at the cost of other areas we care about when building robust systems for customers.

Here are the 3 main areas I'm thinking about when approaching agentic engineering. Note for many people and orgs cost is going to be a big factor but for sustainable engineering I actually think mental overhead is a more pressing issue with the assumption you're spending tokens within a reasonable max plan or two from the SOTA labs.

The three-way tradeoff of agentic engineering: speed, quality, and mental overhead pull against each other — over-optimize any one corner and you fall into its trap, the balance is in the middle

Speed - how fast you create output.
Quality - whether that output is "good". At this point it typically solves the problem / context you gave the model, but frequently misses on common sense "would I publish that".
Mental overhead - comprehension debt, multitasking, and burn out.

Comprehension debt

The current biggest risk with agentically engineered systems is comprehension debt - the thing exists and works but you have no idea how it did that so you have no intuition for what changes make sense, what will break, or how best to evolve it. You are day 1 in someone else's codebase every single day.

Multitasking

AIs also make it really easy to kick off tasks with AI which seems like offloading work and parallelizing. But in the current state of things, you typically still have to help them finish a loop to fully close that thread which means each extra thread you kick off is another one you're managing, even if passively. Humans are terrible at multitasking so this feels productive but often isn't - leading to more comprehension debt, more mental fatigue, and typically worse quality outputs.

When compounded over months with the additional threads spun up by other people in your team / org this path is a quick way to burn people out - lots of activity but little connection / fulfilment from the work at hand.

Common failure modes:

Speed optimization - This is where vibe coding goes. You go full hands off and let the agents run everything. The first few releases are blazing fast. But then new features and bug fixes become harder and harder to ship. You've traded short term speedups for long term slowdowns. If you go too far down this path you're left with a pile of duct taped systems that barely work, can't be changed, and you have no idea how it's held together so often easier to just scrap it.
Quality optimization - Optimizing too much for quality often means believing that agents aren't good enough to write quality software so you do it yourself. This will work for a time but agents are nearly as capable at coding scoped tasks and likely are better across the full breadth of the stack. Sticking with this will likely limit you in your ability to adapt in the next 6-12 months unless you are an absolute coding machine / highly specialized (few ppl are).
Mental overhead optimization - For comprehension debt this is like trying to review every single change an agent makes in a loop. You sit there, see what changed, and accept / deny. But this actually adds overhead for you and it gets rid of the best advantages you and the agent have. It's like sitting over someone's shoulder as they try to build a feature and pointing out every single mistake - it works but it's not a terribly effective use of your or their time. For multitasking you may try to just single task which is fine but you miss out on a lot of the concurrency benefits of agents. So you know everything that's going on but may not be able to keep up with people who are leaning into agents.

How I think about coding with agents

My current read is that agents are very fast, knowledgeable, horizontally scalable junior engineers. They do scoped engineering tasks well but often lack bigger-picture strategy and vision we'd expect of a senior IC / leader. So the best use of them today is as levers to increase your reach - you drive, they do the footwork.

You drive - strategy, vision, systems, and guardrails.
The agents do the footwork - research, reading, writing, and coding.

Humans drive, agents do the footwork: you give one command at the surface and a fan-out of agents does the work beneath it

A few principles I have for doing this effectively:

Hold the quality bar high - this is how you move fast long-term. Bad codebases lead to more incidents, more bugs, and slower dev. It's a vicious cycle. As Meta says, move fast with stable infrastructure
Use AI to code fast and improve as you go - and spend the time you save on the higher-level architecture, decisions, and systems. Use the extra time / speed to do the extra refactor and fast follow. Make the codebase 1% better each time so you can move fast long term.
Stay in the loop - so you can combat comprehension debt and stay agile / aligned as things change. You still authored the system, you just didn't type the code yourself. This doesn't mean looking at every change the AI makes but it does mean having standard checkpoints to align on what's happened (like a PR review).
Keep multitasking to a minimum - humans are bad at it. Work on one thing at a time, but let AI fan out beneath you: code reviews, researching alternatives, stress-testing edge cases, running test suites / e2e browser tests.

The agentic stack, human to AI to tools: the AI is the nondeterministic decision core you steer, leveraging deterministic tools and guardrails beneath it

Improve the AI as you go - AIs are currently bounded by context - how much information they can hold and effectively use. As you find patterns of failures, update their context to avoid those going forward. The AI itself is a system, just less deterministic than we're used to building on. But it can still be tweaked just like code.
Build deterministic guardrails for the nondeterministic models - The power and flaw of AI is its nondeterminism (similar to humans!). We can leverage AI to build better deterministic guardrails. Making too many comments? Update the context to avoid those. Making easy to catch mistakes? Build out a review phase for agents. Making code in a style you don't like / has problems? Build out a linter rule and have the agents run it. Landing code with broken tests? Add a hook that runs the tests before commits. You build and improve the system the AIs work within.

How I run agentic engineering loops

How I run the agentic engineering loop: vertical slice to stacked commits to stacked PRs to review to ship, one main and one side at a time, improving the system each cycle

The loop I run looks like this:

Plan projects as vertical slices, scoped for parallelization - clear chunks I can hand to agents or humans, split on a contract boundary (frontend/backend, or service by service).
Iterate on 1-2 vertical slices at a time - one main, one side. The side one gives me something to push on while my main agent builds.
- Build each slice end-to-end in stacked atomic commits, testing and iterating as I go. Build an e2e test if I can.
- Pull the bottom commits off into stacked PRs - improve the quality of each chunk, double-check the e2e tests still pass, and move it to review once it's in good shape.

In practice I can run 1-2 stacks concurrently without too much multitasking overhead. Once one is kicked off I check the other, and if both are working I go do maintenance - messages, docs, reviews. Using that downtime well matters, because reviews have increasingly become the bottleneck in engineering organizations.

I let the agent build the full vertical slice first, knowing it won't be great, then iterate to get it working e2e. After that I come back and improve each PR one by one with targeted reviews and refactors.

Doing e2e first allows me to be agile and move quickly to fix / change direction before I commit to the full eng cycle. The e2e test allows me to ensure I haven't broken anything as I pull off commits and refactor the approach.

How I ship a vertical slice: build it end-to-end as a stack of atomic commits with an e2e test on top, then peel them off the bottom into stacked PRs for review

I push the agents to do more footwork before they need me so they can work on larger tasks longer over time. Some example improvements include guardrails to double-check their own assumptions and ground it with sources, review their own code, run types / tests / linters, and iterate with TDD. When I hit a pattern I don't like - bad comments, weak tests, suboptimal approach - I update the skills and reviews to prevent it from happening again.

Results of agentic engineering

PR throughput before and after agents: ~7 to ~12 PRs per week, a 50–60% gain, with side projects closer to 2–5×

Before: I shipped 1-2 PRs a day, averaging ~7 a week
After: I ship 2-3 PRs a day, averaging ~12 a week

So my guesstimate is a 50-60% speed improvement over my previous workflows.

Now this is a far cry from the 10-100x improvements some people are claiming. But I still think it's a huge improvement.

In my workflows I'd argue that quality has gone up due to more and faster reviews, research, and refactors. I get to spend more time thoroughly thinking, reviewing, and exploring various ways to do things which I think is leading to better outcomes overall.

Day-to-day this may just be:

an extra refactor that I can knock out in 10 minutes with a simple prompt
accepting more nit changes cause I can just spin up an agent to do it and re-review
trying out a new technique / approach I'm not sure will work because it only takes 20 mins and I can toss it if I don't like it

In my side projects I've found the speedups to be much greater - closer to 2-5x improvements. Some of this is just because I'm much more lenient on what I accept but also I think the required context is much smaller to make a change. There's no nebulous institutional knowledge or gotchas that either haven't been written down or don't fit into context. It's just a targeted build and agents are very good at that.

But at work, where the standards are higher and there's a lot of institutional context that doesn't fit in the AI's window, I think those extra 2-3 passes are useful and necessary. I do expect this to shift as agents get larger effective context windows, improve at following skills / rules, and get forms of actual reasoning but I don't think we're there yet. So when I hear of teams and orgs going full agentic with agent teams doing everything I am a bit skeptical cause the results I've seen on real codebases of substantial size, domain complexity, and correctness requirements have been subpar and quite costly to perform.

But for targeted things they're great - we're automating more of the one-off bug triages, flakey tests, and simple requests builds which removes mechanical toil from the team. And we're still early so I'm positive this will only improve with time - in speed, quality, mental overhead, AND cost (open source / 3rd party models will catch up eventually).

That's the balance I've found useful:

Drive strategy, systems, and vision
Let agents do the footwork quickly
Do 1-2 things at a time and improve the system each cycle

I review a lot more code but am seeing ~50-60% throughput improvements for it and I believe this system compounds. Better skills, better models / harnesses, and better workflows will allow humans to scale themselves even further through the vertical / horizontal scale of these machines.

If you're curious about my AI skills, I publish snapshots of my ai-dotfiles each month to my HAMY LABS Example Repo on GitHub, available to HAMINIONS Members.

If you're seeing improvements from agents (or not) I'd be curious to hear what your setup and workflow look like.

If you liked this post, you might also like:

How to Balance Speed, Quality, and Mental Overhead in Agentic Engineering

The three-way tradeoff: speed, quality, and mental overhead

How I think about coding with agents

How I run agentic engineering loops

Results of agentic engineering

Next