ai / code review

updated May 04, 2026

Code generation is now cheap. Code review isn't. Agents now generate most of my team's code, but a human still reviews every change. Is code review becoming a bottleneck of AI-assisted programming?

Some numbers

Faros AI tracked 10,000 developers across 1,255 teams. High-AI teams completed 21% more tasks and merged 98% more pull requests. PR review time went up 91%. Bug rate per developer went up 9%. DORA metrics (deployment frequency, lead time, change failure rate, time to recovery) remained flat.

Individual velocity went up but organizational throughput is flat. Writing code got cheaper. Everything downstream got harder.

Avery Pennarun calls this the 10x rule: every layer of approval makes a process 10x slower, almost entirely wall-clock time. Not effort. Waiting.

Skill atrophy, loss of understanding

If AI writes all the code and you only review it, where does the skill to review come from?

A 2026 study by Shen and Tamkin tested this: developers using AI scored 17% lower on conceptual understanding, debugging, and code reading. The largest gap was in debugging, the exact skill you need to catch what AI gets wrong.

This creates a feedback loop: the more AI writes, the less qualified humans become to review it. The people who benefit most from AI productivity are exactly the ones who need review skills to supervise it.

Accumulated loss of understanding happens when you build fast without comprehending what you built.

Less pain for small teams

For a two-person team, reviewing every change still works. We wrote the original code. We know the architecture. When the agent extends an existing pattern, review is fast because we recognize the shape. More than familiarity, it's trust. We trust each other to stop when something's wrong.

Loss of understanding bites harder at scale: large teams where nobody wrote the original code and agents are extending agent-written code.

Additional techniques

Avery argues a reviewer's job is to make their review comments unnecessary in all future cases. By the time a review catches a mistake, the root cause already happened.

We've invested in exactly this: making classes of review comments unnecessary.

AGENTS.md files instruct the agent before it writes code. Architecture, conventions, domain rules are scoped to the directory the agent is working in. Good instructions mean fewer surprises in the diff.

Tight feedback loops let the agent fix its own mistakes. Run the checks, feed failures back, let it iterate. By the time I see the PR, the agent has already addressed the class of problems that deterministic tools can catch.

A fresh agent reviewer gives a second opinion. I start a new agent and run a saved prompt that checks the branch against main, flags critical logical errors and security issues first, verifies the changes against the project's README.md files, and asks whether it's as simple as it could be. This works whether the code was written by me or another agent. Pointed at the whole codebase instead of a branch, the same prompt produces a categorized findings list, sorted by severity and topic, that the team can address one finding per commit.

This is also one agent reviewing another's work. Avery warns about this path taken too far: agents reviewing agents, writing frameworks for agents, and so on into what he calls the Descent Into Madness. The difference is the human stays in the loop. The agent reviewer is a filter, not a replacement.

Fast CI via cibot runs tests, linting, and security scans within seconds of push. When CI catches the mechanical errors, review can focus on intent and design.

The first two techniques are becoming table stakes for AI-assisted programming. The latter two are still a competitive advantage.

Review intent, not just code

Some people take this further. There's an argument that code review should die entirely, replaced by spec-driven development where humans review plans and acceptance criteria, not 500-line diffs.

I'm not there yet. I still read the diffs. But I notice the balance shifting. The most valuable part of my work is increasingly upstream as a human on the loop, not in the loop: did I write a clear prompt? Did the AGENTS.md constrain the agent well enough? Could I write other command line tools to catch more of the mechanical errors before I see the PR?

Open questions

Will I feel skill atrophy and loss of understanding? I haven't yet, but the studies suggest it's invisible because the tool compensates for it.

Will layered verification replace line-by-line review? The verification layers I already have (fast CI, a fresh agent reviewer) catch different classes of problems. In theory, you can stack enough layers so no single failure slips through every one.

Avery, drawing on Deming, warns that stacking review layers can backfire. When a second QA team checks the first, the first team gets lazier. The people building the widgets stop checking their work because "that's what QA is for." The same incentive applies here: if I know the agent reviewer already checked, I might skim the diff instead of reading it.

In practice, I haven't seen layered verification work at the level of trust that replaces a good human reviewer. Knowing the incentive helps me resist it.

What comes next?

Today, agents generate and humans review. That works at my scale.

The pressure to kill code review is real, but it's most acute at large organizations where nobody wrote the original code and agents are extending agent-written code.

Avery suggests restructuring into smaller teams with well-defined interfaces between them. Quality starts bottom-up and spreads when teams are small enough to trust each other.

I'm fortunate to be on a small team where we still know the whole system. Review is a conversation for us, not a bottleneck. The human reading the diff is still the best check we have, and the best use of my time is making sure it has less to catch.

← All articles