ai / code review
Code generation is now cheap. Code review isn't. Agents now generate most of my team's code, but a human still reviews every change. Is code review becoming a bottleneck of AI-assisted programming?
Some numbers
Faros AI tracked 10,000 developers across 1,255 teams. High-AI teams completed 21% more tasks and merged 98% more pull requests. PR review time went up 91%. Bug rate per developer went up 9%. DORA metrics (deployment frequency, lead time, change failure rate, time to recovery) remained flat.
Individual velocity went up but organizational throughput is flat. Writing code got cheaper. Everything downstream got harder.
Skill atrophy, loss of understanding
If AI writes all the code and you only review it, where does the skill to review come from?
A 2026 study by Shen and Tamkin tested this: developers using AI scored 17% lower on conceptual understanding, debugging, and code reading. The largest gap was in debugging, the exact skill you need to catch what AI gets wrong.
This creates a feedback loop: the more AI writes, the less qualified humans become to review it. The people who benefit most from AI productivity are exactly the ones who need review skills to supervise it.
Accumulated loss of understanding happens when you build fast without comprehending what you built.
Less pain for small teams
For a two-person team, reviewing every change still works. We wrote the original code. We know the architecture. When the agent extends an existing pattern, review is fast because we recognize the shape.
Loss of understanding bites harder at scale: large teams where nobody wrote the original code and agents are extending agent-written code.
Additional techniques
It's not just team size. We've invested in making review cheaper by moving effort upstream.
AGENTS.md files instruct the agent before it writes code.
Architecture, conventions, domain rules are
scoped to the directory the agent is working in.
Good instructions mean fewer surprises in the diff.
Tight feedback loops let the agent fix its own mistakes. Run the checks, feed failures back, let it iterate. By the time I see the PR, the agent has already addressed the class of problems that deterministic tools can catch.
A fresh agent reviewer gives a second opinion.
I start a new agent and run a saved prompt
that checks the branch against main,
flags critical logical errors and security issues first,
verifies the changes against the project's README.md files,
and asks whether it's as simple as it could be.
This works whether the code was written by me or another agent.
Fast CI via cibot runs tests, linting, and security scans within seconds of push. When CI catches the mechanical errors, review can focus on intent and design.
The first two techniques are becoming table stakes for AI-assisted programming. The latter two are still a competitive advantage.
Review intent, not just code
Some people take this further. There's an argument that code review should die entirely, replaced by spec-driven development where humans review plans and acceptance criteria, not 500-line diffs.
I'm not there yet.
I still read the diffs.
But I notice the balance shifting.
The most valuable part of my work is increasingly upstream as
a human on the loop, not in the loop:
did I write a clear prompt?
Did the AGENTS.md constrain the agent well enough?
Could I write other command line tools to catch more of the mechanical errors
before I see the PR?
Open questions
Will I feel skill atrophy and loss of understanding? I haven't yet, but the studies suggest it's invisible because the tool compensates for it.
Will layered verification replace line-by-line review? The verification layers I already have (fast CI, a fresh agent reviewer) catch different classes of problems. In theory, you can stack enough of these so no single failure slips through every layer. In practice, I haven't seen this work at the level of trust that replaces a good human reviewer.
What comes next?
Today, agents generate and humans review. That works at my scale.
The pressure to kill code review is real, but it's most acute at large organizations where nobody wrote the original code and agents are extending agent-written code. I'm fortunate to be on a small team where we still know the whole system. Review is a conversation for us, not a bottleneck.
That may not last forever. But for now, the human reading the diff is still the best check we have.