Back Original

How to Balance Speed, Quality, and Mental Overhead in Agentic Engineering

DISCLOSURE: If you buy through affiliate links, I may earn a small commission. (disclosures)

I've been back at work for about a month now after parental leave and a 12 week programming retreat at Recurse Center. We're leaning hard into agentic engineering and I'm bullish on it - I spent a lot of that time building agentic systems from skills to orchestrators to tradeoffs between speed and quality. Now I'm back up to speed at work which required rethinking some of those practices at enterprise scale and I think I've found a good balance between speed, quality, and mental overhead.

In this post I want to share what's working for me with the current state of the tools. A lot has changed since I last shared how I use agents in 2025 - the models and harnesses are better, the context windows are bigger, and I've moved to a terminal-focused dev setup to streamline my agentic engineering workflow for portability, concurrency, and friction.

The three-way tradeoff: speed, quality, and mental overhead

When people talk about coding with agents, they usually talk about speed. And agents are really, really fast at reading / writing / coding. But optimizing for speed typically comes at the cost of other areas we care about when building robust systems for customers.

Here are the 3 main areas I'm thinking about when approaching agentic engineering. Note for many people and orgs cost is going to be a big factor but for sustainable engineering I actually think mental overhead is a more pressing issue with the assumption you're spending tokens within a reasonable max plan or two from the SOTA labs.

The three-way tradeoff of agentic engineering: speed, quality, and mental overhead pull against each other — over-optimize any one corner and you fall into its trap, the balance is in the middle

Comprehension debt

The current biggest risk with agentically engineered systems is comprehension debt - the thing exists and works but you have no idea how it did that so you have no intuition for what changes make sense, what will break, or how best to evolve it. You are day 1 in someone else's codebase every single day.

Multitasking

AIs also make it really easy to kick off tasks with AI which seems like offloading work and parallelizing. But in the current state of things, you typically still have to help them finish a loop to fully close that thread which means each extra thread you kick off is another one you're managing, even if passively. Humans are terrible at multitasking so this feels productive but often isn't - leading to more comprehension debt, more mental fatigue, and typically worse quality outputs.

When compounded over months with the additional threads spun up by other people in your team / org this path is a quick way to burn people out - lots of activity but little connection / fulfilment from the work at hand.

Common failure modes:

How I think about coding with agents

My current read is that agents are very fast, knowledgeable, horizontally scalable junior engineers. They do scoped engineering tasks well but often lack bigger-picture strategy and vision we'd expect of a senior IC / leader. So the best use of them today is as levers to increase your reach - you drive, they do the footwork.

Humans drive, agents do the footwork: you give one command at the surface and a fan-out of agents does the work beneath it

A few principles I have for doing this effectively:

The agentic stack, human to AI to tools: the AI is the nondeterministic decision core you steer, leveraging deterministic tools and guardrails beneath it

How I run agentic engineering loops

How I run the agentic engineering loop: vertical slice to stacked commits to stacked PRs to review to ship, one main and one side at a time, improving the system each cycle

The loop I run looks like this:

In practice I can run 1-2 stacks concurrently without too much multitasking overhead. Once one is kicked off I check the other, and if both are working I go do maintenance - messages, docs, reviews. Using that downtime well matters, because reviews have increasingly become the bottleneck in engineering organizations.

I let the agent build the full vertical slice first, knowing it won't be great, then iterate to get it working e2e. After that I come back and improve each PR one by one with targeted reviews and refactors.

Doing e2e first allows me to be agile and move quickly to fix / change direction before I commit to the full eng cycle. The e2e test allows me to ensure I haven't broken anything as I pull off commits and refactor the approach.

How I ship a vertical slice: build it end-to-end as a stack of atomic commits with an e2e test on top, then peel them off the bottom into stacked PRs for review

I push the agents to do more footwork before they need me so they can work on larger tasks longer over time. Some example improvements include guardrails to double-check their own assumptions and ground it with sources, review their own code, run types / tests / linters, and iterate with TDD. When I hit a pattern I don't like - bad comments, weak tests, suboptimal approach - I update the skills and reviews to prevent it from happening again.

Results of agentic engineering

PR throughput before and after agents: ~7 to ~12 PRs per week, a 50–60% gain, with side projects closer to 2–5×

So my guesstimate is a 50-60% speed improvement over my previous workflows.

Now this is a far cry from the 10-100x improvements some people are claiming. But I still think it's a huge improvement.

In my workflows I'd argue that quality has gone up due to more and faster reviews, research, and refactors. I get to spend more time thoroughly thinking, reviewing, and exploring various ways to do things which I think is leading to better outcomes overall.

Day-to-day this may just be:

In my side projects I've found the speedups to be much greater - closer to 2-5x improvements. Some of this is just because I'm much more lenient on what I accept but also I think the required context is much smaller to make a change. There's no nebulous institutional knowledge or gotchas that either haven't been written down or don't fit into context. It's just a targeted build and agents are very good at that.

But at work, where the standards are higher and there's a lot of institutional context that doesn't fit in the AI's window, I think those extra 2-3 passes are useful and necessary. I do expect this to shift as agents get larger effective context windows, improve at following skills / rules, and get forms of actual reasoning but I don't think we're there yet. So when I hear of teams and orgs going full agentic with agent teams doing everything I am a bit skeptical cause the results I've seen on real codebases of substantial size, domain complexity, and correctness requirements have been subpar and quite costly to perform.

But for targeted things they're great - we're automating more of the one-off bug triages, flakey tests, and simple requests builds which removes mechanical toil from the team. And we're still early so I'm positive this will only improve with time - in speed, quality, mental overhead, AND cost (open source / 3rd party models will catch up eventually).

Next

That's the balance I've found useful:

I review a lot more code but am seeing ~50-60% throughput improvements for it and I believe this system compounds. Better skills, better models / harnesses, and better workflows will allow humans to scale themselves even further through the vertical / horizontal scale of these machines.

If you're curious about my AI skills, I publish snapshots of my ai-dotfiles each month to my HAMY LABS Example Repo on GitHub, available to HAMINIONS Members.

If you're seeing improvements from agents (or not) I'd be curious to hear what your setup and workflow look like.

If you liked this post, you might also like: