James Routley

2026-05-07

Code review is broken.

The industry-established code review process, review-then-commit, was a straightforward mechanism that allowed a relatively low-trust group of engineers to collaborate. It appears to have been initially developed for the Apache server OSS project in the 90s, corporatized by Google in the early 2000s, and popularized throughout the industry by several means, most notable of which was the GitHub PR.

It was very simple:

A human makes a change.
This change is packaged up, sent to another human for commentary.
Rounds of commentary and adjustments continue until the reviewer approves (LGTMs) it.
The change is committed.

This is not Michael Fagan's defect analysis work or the ticket-like processes used for critical systems changes in fields like aerospace. This will not catch your bugs. It will, however, communicate design changes to other engineers who maintain a mental model of the codebase, and reviewers can use the process to teach norms to contributors. It has advantages, and because there is a gate before the main branch changes, it does not require much trust. That makes it a great tool for scaling a company, because beyond ~10-12 engineers (the "two pizza" team, among other names), trust erodes rapidly. It is also great for scaling OSS. It puts work on reviewers, but there was work on the human making the change too. An imbalance existed but was often manageable.

The crisis of code review

Agents broke this. If you insert an agent into the existing process, your best possible outcome is:

A human instructs a machine to make a change.
The human reviews the code, iterates with comments until they approve it.
This change is packaged up, sent to another human for commentary.
Rounds of commentary and adjustments continue until the reviewer approves (LGTMs) it.
The change is committed.

This doubles the amount of review. But companies were already review limited. In a really well-functioning team, a code review cycle could take a day. (Between two engineers who get on well and intimately know each other's work, you could shrink this to an hour.) But across the industry the number was, optimistically, days to get a review merged before agents.

Additionally, the whole reason engineers use agents is it improves productivity. More total changes are generated. So we doubled review, and increased the total changes. As you modify the old model, you run out of review bandwidth before you have extracted all the value you can from agents. (And anecdotally, you run out of bandwidth before you get even a fraction of the value of agents.)

But things get worse, because no-one actually augments the old processes this way.

The agent principal-agent problem

What happens in reality are processes like this:

A human instructs a machine to make a change.
This change is lightly QA'd, packaged up, sent to another human for commentary.
Rounds of commentary come back from the reviewer and are sent wholesale to the machine for adjustments until the reviewer approves (LGTMs) it.
The change is committed.

This is an example of what economists call the principal-agent problem: the reviewer is the principal, the contributor is the agent, and code review only worked because the reviewer could cheaply infer effort from reading the code. Agents collapse that signal. This is what is killing OSS, and it is commonly being referred to as "slop PRs". There is no incentive for the human driving the agent to actually read the code or spend time thinking about what the reviewer says.

The result is a radical imbalance. "Contributors" type a sentence or two, of the quality of a poor bug report, spend 5 minutes poking at the resulting program, and then generate serious review load for another engineer. You can do this with no understanding of the underlying project, its constraints, or the tools used to construct it. This is an unmanageable disaster. This does not even work in environments where the reviewer is paid to do the work, because they could be more productive by prompting the agent themselves.

Potential solutions

Small high-trust teams have an easy process they can adopt:

A human instructs a machine to make a change.
The human reviews the code, iterates with comments until they approve it.
They push the change to production and deploy.

There is still a human in the loop. There is still a reviewer who did not get deeply lost in the weeds of how a problem could be solved. Most importantly, there is no principal-agent problem, because the human driving the machine takes on the responsibility for its actions by owning the deployment.

Anecdotal evidence suggests this works for small teams. With a team of nine at exe.dev we have been able to make it work. We spend a lot more time writing integration tests, e2e tests, building agent-based workflows for analyzing commits for safety or performance or usability bugs to minimize risk. This is a lot of machinery teams traditionally do not develop until they are far larger and more mature, on the other hand it is much easier to develop thanks to agents. We also have had to be very selective about our colleagues and be intentional in our communication. But we ship this way.

This is not tenable in low-trust environments, i.e. large companies. You have to trust your co-workers to start a conversation about architectural changes before they do it. No-one at BigCo trusts their colleagues to make sweeping changes to a service they "own". And no-one at BigCo wants to be on the hook for a major outage without having coverage from a code review to smear the blame around. (Low trust environments are awful places.)

I am sure there are small isolated teams at big companies that have broken with standard practices and are getting real value out of agents. I am also sure there are ICs who have work that lets them maximize the value of an agent without involving their colleagues. (E.g. if you work in quality, agents can help you write and execute endless large-scale experiments you never need get reviewed, just send out what works.) But the vast majority of big company engineers cannot make changes, especially cross-functional changes that agents do so well, without review eating all the productivity gains.

Some hints in the history books

As of writing this, I have not seen anyone describe a process that "scales" agent-driven development in a large company. There is, however, evidence from the past that it is possible. I would point to Microsoft in the 1990s, which did not have mandated review-before-commit practices. Some teams may have, but the company, while large, was organized as many independent teams constantly synchronized by QA processes. This is regarded as "old-fashioned" "cowboy" style development by proponents of the large-team processes that came before agents. But it did work. It created some of Microsoft's most long-lived successful products, like the win32 API. (And yes we could critique a 30 year old API endlessly, but it is still there and significantly better than some of its "replacements" that were built with code review processes.) Little appears to be written about this period of Microsoft history, if you were there I would love to hear or read about your experiences.

Until someone develops robust processes for agent use in low-trust environments, small teams have a large force multiplier available to them that big teams do not. Ship while you can.