Devs produce twice as many PRs. Delivery hasn't moved.
The same picture keeps showing up. The AI dashboard is green. Adoption above 90%. Accepted code trending up. Token consumption climbing every month. Devs say they're faster.
Delivery, so far, hasn't kept pace.
At some companies, leaderboards of who burns the most tokens circulate on Slack. At Meta, engineering managers track AI consumption per team (1). Cleo allows up to $2,000/month in tokens per engineer (1). Extreme case: one developer at Anthropic hit $150,000/month on Claude Code (2). Token volume is becoming a proxy for performance. The problem isn't the budget. The problem is what we're measuring.
The "lines of code" of 2026
Jellyfish studied 7,500 developers over Q1 2026 (3). Between the median developer ($52/month in tokens) and the 90th percentile ($691/month), output doesn't follow the cost curve. Returns diminish. The highest spender on a team can have the best cost per delivery (4), because they use AI in deep sessions that resolve in fewer iterations. Or not. Raw cost, without tying it to what gets shipped, tells you nothing.
The phenomenon has a name: tokenmaxxing (5). Measuring AI productivity by token consumption. Lines of code in the 2000s, story points in the 2010s, tokens today. Each time, we measure the most visible input and mistake it for a result. We end up getting more of what we measure, and less of what we wanted (6).
Measuring adoption makes sense early on. You check that the tool is being used, that teams are getting comfortable with it. The danger is when a deployment metric becomes the permanent metric of success.
Devs are faster. The team isn't.
Faros.ai measured this across 10,000 developers in 1,255 teams (7). At the individual level, the gains are real: twice as many PRs, nearly 50% more tasks per day. At the company level, no measurable correlation with improved throughput, DORA metrics, or quality.
Individual gains don't automatically transfer to the org level. The data is still young, but the signal is worth paying attention to.
Code arrives faster, but the rest of the system hasn't kept up. Code review has become a bottleneck. PRs pile up, sometimes without developers actually improving their review practices (time invested, automated tooling). Coding time has been compressed by AI. Review time, testing time, time to production. None of that has.
Faros qualifies its own results: adoption is recent, 2 to 3 quarters of critical mass, and the numbers may evolve. The signal remains clear. Accelerating production without adapting the downstream pipeline creates a traffic jam further down the road. The only honest path forward is that developers need to spend more time on review, and the team culture needs to evolve to make that possible. Automate as much as you can, yes. But going from no automation to "we don't review code anymore" is a bet I won't make. Not until you've iterated on your automated review tooling for a few months and calibrated which PRs can skip review, which need a light pass (1 reviewer), and which need the full treatment (2+ reviewers). Learning takes time. Rushing it doesn't save any.
The value is second-order
The most tangible quality improvements don't come from AI-generated code. Hard to prove at Faros scale. But a pattern I keep seeing, and worth calling out.
Take tests. AI can generate them, and it's getting better at it. But the biggest gains I observe don't come from auto-generated tests. They come from the freed-up time that lets developers write more tests themselves, and write them better.
Same logic for refactoring. Long-stalled projects become possible. For mechanical refactoring (renames, method extraction, pattern migration), AI speeds things up. And the reclaimed time lets the team tackle the more structural work. Refactoring becomes more ambitious. Large swaths of technical debt start moving, where before the team was just patching.
Documentation follows the same pattern. AI is very effective at keeping specs up to date, generating first drafts of ADRs (Architecture Decision Records), structuring postmortems. Tasks nobody did because they took too much time for too little immediate reward. Nothing spectacular. Nothing that makes it into a keynote. The kind of quiet progress that changes a team's trajectory over six months.
AI hasn't improved quality directly. It's given teams the time to do what actually improves quality. No token dashboard captures an ADR written, an ambitious refactoring, or a rising test coverage trend. Yet that's where the difference lies between a team that consumes AI and a team that extracts value from it.
Not an argument for cutting AI budgets. The tools report usage because that's their business model. Fair enough. It's up to leaders to choose better instruments to evaluate outcomes.
Three dashboards, only one that matters
We've known since the DORA metrics: the tool doesn't predict delivery, and delivery doesn't predict client value. AI doesn't change that hierarchy.
The AI dashboard measures the tool: tokens consumed, adoption rate, accepted code. Most teams stop here.
The delivery dashboard measures the process: cycle time, lead time, deployment frequency, change failure rate. Better. But shipping faster isn't shipping better. I wrote about this in Cheap to build, costly to keep: the cost of building has collapsed. The cost of ownership hasn't. This is where you quickly spot the degradation of metrics like time-to-review.
The client outcome dashboard measures value: feature adoption, retention, satisfaction, business impact. The real goal of a successful transformation. Where quarterly objectives and OKRs should focus. Deploying is no longer enough. Improving cadence isn't enough either.
Next time someone asks "what's the ROI of AI?", the answer shouldn't be an adoption rate. It should be: what have our customers gained that they didn't have six months ago?
Sources
(1) The Impact of AI on Software Engineers in 2026 — Gergely Orosz, The Pragmatic Engineer (April 2026)
(2) Tokenmaxxing Is Real: Engineers Now Burn Through $150K/Month in AI Compute — AI:PRODUCTIVITY (2026)
(3) Is Tokenmaxxing Cost Effective? New Data from Jellyfish — Jellyfish (Q1 2026)
(4) Your Most Expensive Developer Might Be Your Most Efficient — Vantage (2026)
(5) Tokenmaxxing: Why Token Consumption Isn't AI Engineering Productivity — Faros.ai (2026)
(6) Tokenmaxxing: The Costly Mistake in AI Engineering Metrics — Duncan Grazier (March 2026)
(7) The AI Productivity Paradox — Faros.ai (2026)
No comments yet. Be the first to comment!