<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Shape and Ship</title>
        <link>https://www.shapeandship.ai</link>
        <description>  Practical frameworks for CTOs, CPOs, tech&amp;product leaders scaling teams, delivery, and AI adoption. From a  CTPO who ships with the teams she works with. Strategy, metrics, and field lessons from scale-ups of 20 to 120 people.
</description>
        <lastBuildDate>Sat, 13 Jun 2026 17:56:06 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>Writizzy</generator>
        <language>en</language>
        <copyright>All rights reserved 2026, Shape and Ship</copyright>
        <item>
            <title><![CDATA["C’est historique."]]></title>
            <link>https://www.shapeandship.ai/p/historique-documentation-missing</link>
            <guid>https://www.shapeandship.ai/p/historique-documentation-missing</guid>
            <pubDate>Sun, 07 Jun 2026 00:00:00 GMT</pubDate>
            <description><![CDATA["C'est historique." Pendant vingt ans, c'était un coût humain acceptable. L'IA le transforme en coût de production direct. Chaque décision non documentée devient une source d'erreur  quotidienne pour les agents. La connaissance orale ne suffit plus.]]></description>
            <content:encoded><![CDATA[<p>&quot;C’est historique.&quot; Je pense que j’entends ça depuis que j’ai commencé le dev il y a vingt ans.</p>
<p>&quot;Pourquoi deux bases de données pour les users ?&quot;
&quot;C’est historique.&quot;</p>
<p>&quot;Pourquoi ce service fait la même chose que celui-là ?&quot;
&quot;C’est historique.&quot;</p>
<p>&quot;Pourquoi on ne touche jamais à ce module ?&quot;
&quot;C’est historique.&quot;</p>
<p>Pendant vingt ans, c’était un coût humain que l’on retrouvait partout, quelle que soit la taille du client. On posait des questions, on reconstituait le contexte, puis on finissait par comprendre au bout de quelques semaines. Ou on refactorait, puis on découvrait le pourquoi a posteriori, généralement après avoir vu un bug remonter ^^ </p>
<p>La connaissance se transmettait par oral et par les gens qui restaient. C’était lent, imparfait, mais ça fonctionnait tant qu’il y avait des humains pour compenser. Sur mes cinq ou six dernières expériences, j’ai dû refaire l’onboarding quasiment à chaque fois. Parfois plusieurs fois, refaire les schéma d’archis, la description des domaines ...</p>
<p>Là où ce n’était qu’un ralentissement, l’IA transforme maintenant ce contexte manquant en coût de production direct. Chaque décision non documentée, chaque raisonnement resté dans la tête de quelqu’un qui est parti, devient une source d’erreur potentielle.</p>
<p>D’un inconfort humain, on est passé à une dette opérationnelle.</p>
<h2>Refondre sans comprendre</h2>
<p>Sur une mission récente, on a dû refondre une partie de l’architecture d’un produit. L’ensemble était devenu tellement complexe que plus personne ne comprenait pourquoi il avait été construit comme ça. Les gens qui avaient fait ces choix n’étaient plus là. Pas d’ADR. Pas de trace du raisonnement. On a passé des semaines à se demander si cette complexité cachait une contrainte réelle qu’on ne voyait pas, ou si c’était simplement de l’over-engineering. Impossible de trancher vite. On a fini par simplifier, mais le doute a ralenti chaque décision.</p>
<p>Je mets toujours en place des ADRs quand j’arrive dans une organisation qui n’en a pas. Mais sur ce produit, tout ce qu’on devait refondre avait été construit avant. Un agent face à cette architecture ne challenge pas la complexité implicite. Il la propage. Il ne va pas spontanément se demander &quot;est-ce que c’est volontaire ?&quot;. Même si on lui demande d’être critique, il n’a aucune base pour juger.</p>
<p>En l’absence de contexte explicite, l’existant devient la vérité. L’agent aurait continué à construire par-dessus, en supposant que si c’était là, c’était volontaire.</p>
<h2>La donnée est partout, la vérité nulle part</h2>
<p>Et même quand on fournit du contexte aux agents, la donnée reste souvent fragmentée entre plusieurs systèmes. Qui détient la vérité ? Où le raisonnement derrière une décision structurante a-t-il été posé ? Dans la spec technique ? Dans l’ADR ? Dans le code ? Et que fait-on des informations qui remontent d’un Notion à moitié à jour ? Des documents obsolètes depuis six mois ? Des fichiers de contexte faux générés par un agent et jamais réellement relus ?</p>
<p>Dans une startup que j’accompagne, les specs sont dans Notion, les échanges clients dans le CRM, les décisions techniques dans Linear, et les agents ne lisent réellement que GitHub. L’accès à l’information n’est plus le sujet. Il manque une source de vérité cohérente.</p>
<p>Notion vient de sortir une CLI. Les éditeurs ajoutent des fichiers de contexte. Toute l’industrie essaie maintenant de brancher les agents sur la mémoire des organisations.</p>
<p>Sauf que, dans la plupart des boîtes où j’interviens, quand on regarde en détail les (nombreuses) sources d’informations, on se rend compte que certaines des plus utiles ne sont au final nulle part. Personnellement, j’ai résolu une partie du problème avec un second brain local sous Obsidian. Mes agents s’en nourrissent et le pourquoi est beaucoup plus détaillé que le quoi que l’on retrouve dans le code notamment. Mais ça ne scale pas à une organisation.</p>
<h2>Vingt minutes qui n’arrivent jamais</h2>
<p>Personne ne refuse d’écrire. C’est simplement que ce n’est jamais prioritaire. Il y a le sprint en cours et l’incident de ce matin. Prendre vingt minutes après une décision d’archi pour noter le raisonnement passe systématiquement après. Vingt minutes qui n’arrivent jamais. Et même quand on décide de l’imposer, il faut souvent plusieurs mois avant que l’habitude s’installe réellement.</p>
<p>Pendant longtemps, ce coût restait diffus. Le nouveau passait trois semaines à poser des questions. Le senior passait une heure, un jour, à réexpliquer les mêmes choses. On râlait pendant les refactos. Avec l’IA, c’est passé du stade de nice to have à celui de must have. Un agent sans contexte produit du code qu’il faut relire et corriger, parfois jeter. Tous les jours. Sur chaque repo où les décisions passées ne sont pas écrites. Le coût est là, dans chaque PR à revoir, et il est proportionnel au nombre de décisions non documentées.</p>
<h2>Dix lignes suffisent</h2>
<p>Quand on commence à écrire, même peu, même imparfaitement, le changement est rapide. Un ADR de dix lignes : &quot;On a choisi Postgres plutôt que DynamoDB parce qu’on avait besoin de transactions cross-tables, et on a écarté Mongo parce que l’équipe n’avait pas l’expérience.&quot; Dix lignes. L’agent qui lit ça ne va plus proposer Mongo. Il ne va plus créer un deuxième schéma qui contredit le premier.</p>
<p>Même chose avec un postmortem de vingt lignes après un incident : &quot;Le déploiement de vendredi a cassé la facturation parce que le feature flag n’était pas branché sur le bon tenant.&quot; Et ca peut même se générer via un agent si le système est concu pour cela. Et l’agent qui lit ça ne reproduira probablement pas la même erreur six mois plus tard. </p>
<h2>So what ?</h2>
<p>La question n’est pas &quot;faut-il documenter ?&quot;. Ça, on le sait déjà. La question est devenue : combien de temps on peut se permettre de ne pas le faire quand chaque agent amplifie le coût de chaque décision non écrite ? Mais aussi, que ne faut-il pas documenter ? Certaines informations sont tellement faciles à aller chercher maintenant dans la codebase par un agent, que le coût de la maintenance de la doc ne se justifie pas toujours.</p>
<p>On parle beaucoup de data governance et de pipelines propres. C’est un vrai sujet. Mais parfois, le problème est encore plus en amont : de la connaissance qui n’a jamais été écrite nulle part. Et aucun modèle ni aucun outil ne compensera le fait qu’il n’y a rien à lire. L’agent, lui, ne va pas rester bloqué.</p>
<p>Il va produire une justification plausible, confiante, et totalement inventée.</p>
]]></content:encoded>
            <category>ia</category>
            <category>organisation</category>
        </item>
        <item>
            <title><![CDATA[Everyone is busy, nothing ships]]></title>
            <link>https://www.shapeandship.ai/p/everyone-is-busy-nothing-ships</link>
            <guid>https://www.shapeandship.ai/p/everyone-is-busy-nothing-ships</guid>
            <pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Everyone is busy and nothing ships. The problem isn't that people aren't productive enough. It's that they're too productive, in the wrong place.]]></description>
            <content:encoded><![CDATA[<p>Sprints close on time, velocity looks good, developers code all day long. But when you look at what&#39;s actually in production, it&#39;s thin.</p>
<p>The team is running. The product isn&#39;t moving.</p>
<h3>Perishable inventory</h3>
<p>In a team I was working with, sprints closed with 80% of story points marked &quot;done.&quot; Except &quot;done&quot; meant &quot;the PR is open.&quot;</p>
<p>Code sat in review for four days on average. Four days during which context evaporates, branches diverge, merge conflicts pile up. Four days during which the developer who wrote that code has already moved on, head deep in a completely different problem. When the review finally lands, they have to dive back into code they half-forgot to respond to comments on decisions they made last week.</p>
<p>The fastest developers opened two or three PRs per sprint and immediately moved to the next ticket. Their individual velocity was excellent. The team&#39;s was fictional. The board said 80% done. Production said otherwise.</p>
<p>A PR waiting four days is perishable inventory. Like pallets left on a loading dock.</p>
<p>Managers looked at the dashboard. The dashboard said everything was fine. The dashboard lied.</p>
<p>The default response is always the same: act on the individual. Train, equip, hire. But an organization that can&#39;t identify its constraints will turn any local productivity gain into overload somewhere else. Same pattern for twenty years: invest in the individual workstation, expect a system-level gain, wonder why nothing moves.</p>
<h3>What everyone knows and no one names</h3>
<p>The problem isn&#39;t that people aren&#39;t productive enough. It&#39;s that they&#39;re too productive, in the wrong place.</p>
<p>Every quarter, leadership committees approve training budgets, tool purchases, individual acceleration programs, without ever asking the question: where does work actually stop in our chain?</p>
<p>Flow is invisible. The individual isn&#39;t. They&#39;re a cost center, a budget line, a role with a title and objectives. You can buy a tool for an individual. You can measure their velocity. You can train them and track the ROI. Flow can&#39;t go on a purchase order. Reporting rewards individual activity, not what actually reaches production.</p>
<p>But there&#39;s a deeper reason. Organizations often end up organizing around their bottlenecks rather than resolving them. Not out of malice. Out of comfort. The senior dev who centralizes every technical decision didn&#39;t choose to become a blocker. They got promoted for their ability to handle everything. The bottleneck is a side effect of what we reward.</p>
<p>Over time, the organization adapts to the slow flow. It accommodates. It builds its processes, rituals, and expectations around it. And questioning the bottleneck means questioning the structure itself.</p>
<p>Seeing the bottleneck isn&#39;t enough to resolve it. It takes months. Because changing the flow means changing the structure. And changing the structure means touching what people consider settled.</p>
<p>Nobody writes that in the post-mortem.</p>
<h3>What a map reveals</h3>
<p>There&#39;s one simple move that changes everything: map the actual path of work. Not the process documented on Confluence, the one nobody follows. The real one, with its loops, its wait states, its handoffs that nobody formalized.  It&#39;s what Steve Pereira and Andrew Davis formalize in Flow Engineering with their five maps, including the current state value stream map.</p>
<p>In the team where PRs were piling up, the map showed that the bottleneck wasn&#39;t code review. It was the absence of a review ritual. Everyone reviewed when they had time, which meant never as a priority.</p>
<p>It&#39;s an uncomfortable exercise. The map shows what you&#39;d rather not see. The bottleneck is sometimes a person. A sacred process. A political decision made two years ago.</p>
<p>And sometimes, the map shows that you&#39;ve built an entire organization around avoiding a single difficult conversation.</p>
<p>As long as we&#39;d rather optimize individuals than question our own structures, we&#39;ll keep producing a lot of activity and very little movement.</p>
]]></content:encoded>
            <category>lean</category>
            <category>flow</category>
        </item>
        <item>
            <title><![CDATA[From Copilot to harness engineering]]></title>
            <link>https://www.shapeandship.ai/p/from-copilot-to-harness-engineering</link>
            <guid>https://www.shapeandship.ai/p/from-copilot-to-harness-engineering</guid>
            <pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Individual AI initiatives are everywhere. Going from there to a team that builds and maintains its own AI harness is a different problem. The gap is smaller than it looks. ]]></description>
            <content:encoded><![CDATA[<p>One dev on Cursor, another trying Claude Code, a third who plugged in Copilot. Going from there to a team that builds and maintains its own AI harness is a different problem. The gap is smaller than it looks.</p>
<h2>Top-down creates what it claims to fix</h2>
<p>&quot;But we&#39;re being told to put AI everywhere and it&#39;s not even useful.&quot; I&#39;ve heard this sentence word for word in one team. And variations in others. &quot;Use AI&quot; as an OKR is bound to fail for the same reason &quot;be agile&quot; as an OKR was bound to fail ten years ago. You&#39;re asking people to adopt something they haven&#39;t tried, don&#39;t see the value in, with a deadline.</p>
<p>Top-down is optimized for reporting, not for change. You can show adoption rates in a quarterly review. You can present a training plan to the board. It&#39;s reassuring, measurable, presentable. But it doesn&#39;t produce autonomy. It produces dependency. The team waits for the next directive instead of developing its own capacity to adapt.</p>
<h2>Three months under the radar</h2>
<p>I apply the same method on every transformation I lead, AI or otherwise. The framework stays the same, the subject changes. On the most recent one, an AI transformation, the team shifted in two days. But those two days needed three months of preparation.</p>
<p>Open call for volunteers. A group formed naturally. From there, experiments in small groups of two, three, sometimes four. Claude Code, Cursor, automated reviews, models we ran internally. Each experiment lasted about a month, with a set format: scope, KPIs, logbook, presentation at the end. Some spilled over into the next one, and that was expected.</p>
<p>In parallel, we had to build a framework with the CISO so experimentation was even possible. Without that, every initiative would have been blocked by a &quot;we haven&#39;t approved that tool.&quot;</p>
<p>In three months, around thirty experiments completed.</p>
<p>Why pairs? Because the alternative is knowledge that stays in one person&#39;s head. Experimenting alone means learning for yourself. Experimenting with someone else means having to articulate what you&#39;re doing, why it works or why it doesn&#39;t. Verbalization forces clarification. And it produces a narrative others can pick up. Someone coming back from a solo experiment says &quot;it was fine&quot; or &quot;it didn&#39;t work.&quot; A pair comes back with an analysis, trade-offs, a structured opinion. The logbook reinforces this: what&#39;s written down is transferable. What&#39;s only experienced stays anecdotal.</p>
<p>This work doesn&#39;t fit in any OKR, any dashboard. It&#39;s a three-month investment that looks like nothing from the outside. Most organizations pick the three-day training plan because it&#39;s presentable. The underground work, nobody wants to fund it.</p>
<p>Yet that&#39;s where autonomy gets built. In a team&#39;s capacity to explore on its own, to fail, and to share what it learns.</p>
<h2>The tipping point</h2>
<p>Two days, in person, in Paris. Remote teams came in for the occasion. The energy doesn&#39;t happen the same way over video. When you&#39;re together, you talk, you watch, you move from one group to another.</p>
<p>40 minutes of presentation to cover the basics: the context files that give the model knowledge of the project, and skills, reusable instructions that the team enriches over time. That&#39;s the harness. Then teams pick their subjects, form pairs or small groups, and build. Each group produces something shareable, a skill or a rule the rest of the team can use the next day. The goal: that each person leaves with the ability to evolve their own tools, not with a training they&#39;ll forget. And that committing to your own skills, and to other teams&#39; skills, becomes a habit.</p>
<p>Total freedom, not quite. When you&#39;ve seen enough transformations, you know where it&#39;s going to break. QA was going to become a bottleneck. Before the two days, the developers most comfortable with AI had received a clear steer: this one needs to be addressed first. They paired up with the QA team members and produced the most effective skills of the two days. One of them takes a product specification, pulls it apart, and exposes every gap: ambiguities, uncovered edge cases, implicit assumptions. The product manager who tested it called it &quot;staggering.&quot;</p>
<h2>When it lives without you</h2>
<p>A month later, the team had accelerated further. Around forty skills created. Regular additions and improvements. Teams commit skills to each other&#39;s repos. Automations, code rules, things nobody was talking about before the two days.</p>
<p>Today, some teams are working on automating entire pipelines: from a stated requirement to a PR with a test plan implemented, no manual intervention. Developers don&#39;t disappear from the process. They change roles. They become the guardians of the guardrails and of quality. They decide what the AI is allowed to do, and what it isn&#39;t.</p>
<p>It&#39;s not adoption that matters. It&#39;s ownership. The signal that a change has taken hold, AI or otherwise, is when the team evolves the practice without being asked. When the leader is no longer necessary for it to move forward.</p>
<p>It&#39;s never finished. The harness gets built continuously, skills evolve, guardrails get adjusted. The day you stop evolving them, you start falling behind again. There&#39;s still a path to walk before we get to a real dark factory, but that&#39;s for another article.</p>
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AI doesn't replace junior developers. But it changes how we train them.]]></title>
            <link>https://www.shapeandship.ai/p/ai-doesnt-replace-juniors-training</link>
            <guid>https://www.shapeandship.ai/p/ai-doesnt-replace-juniors-training</guid>
            <pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[ AI speeds up production, not learning. Cutting junior hiring is rational today and a disaster in 5 years. The
  real question is whether we still know how to train them.]]></description>
            <content:encoded><![CDATA[<p>The startups I talk to don&#39;t want to grow anymore. The equation is tempting: a senior team equipped with AI promises to produce as much as a team twice its size. Junior developer hiring has collapsed, down 67% since 2022. In France, 26% of tech leaders have already cut junior positions.</p>
<p>On paper, the equation holds. In practice, nobody has measured the real productivity of a full-senior + AI team over time. What we do know is that the equation ignores one parameter: in 5 years, who will have built the skills to take over?</p>
<h2>AI speeds up production, not learning</h2>
<p>I built POCs in languages I didn&#39;t know using AI. Working prototypes, in a few hours, in technologies I&#39;d never touched. The prototype worked. My understanding of the language? Still close to zero.</p>
<p>This isn&#39;t an isolated anecdote. On a recent engagement, I worked with an engineering school intern. Technically, she delivered. But when asked to explain her own code, she couldn&#39;t. The code ran fine. She couldn&#39;t explain how.</p>
<p>Anthropic measured this in a lab setting: developers learning a new library with AI assistance scored 17% lower on comprehension than those coding without it <a href="https://www.shapeandship.ai#source-1">(1)</a>. The speed gain wasn&#39;t statistically significant. A single study, a specific context, but the signal is consistent with what we see on the ground. Debugging is the skill most affected.</p>
<p>MIT Media Lab documented a related effect they call &quot;cognitive debt&quot; <a href="https://www.shapeandship.ai#source-2">(2)</a>. When AI does the work, the brain engages less. Up to 55% reduced neural connectivity. 83% of participants couldn&#39;t quote from essays they&#39;d just written with AI. Preliminary research, not yet peer-reviewed, but the direction is clear.</p>
<p>In the teams I work with, AI speeds up production. It doesn&#39;t speed up competence. For juniors, the effect is even more pronounced. We&#39;re producing developers who can generate code but can&#39;t tell when it&#39;s wrong.</p>
<p>A counter-argument keeps coming up: soon we won&#39;t read code at all. AI will generate, test, deploy. Judgment about code will become irrelevant. Maybe. But even in that scenario, someone needs to understand what the system does, why it does it, and when it gets it wrong. Judgment will shift from code to architecture, security, production behavior. It won&#39;t disappear. And nobody learns it from a prompt.</p>
<h2>What a team loses when it stops hiring juniors</h2>
<p>There&#39;s an effect that numbers don&#39;t capture. Teams that haven&#39;t hired juniors in a long time end up turning inward. Seniors optimize what they already know instead of questioning it.</p>
<p>At Jolimoi, what I loved about juniors was that they wanted to conquer the world. They asked questions nobody was asking anymore. That energy is missing in teams that have stopped hiring.</p>
<p>But juniors don&#39;t just benefit from seniors. They benefit seniors. Mentoring forces you to formalize what you know. If you can&#39;t explain a concept to a junior, maybe you don&#39;t understand it as well as you thought. A junior&#39;s &quot;why do we do it this way?&quot; is a stress test on decisions that may just be inertia. Seniors who never mentor can plateau without noticing. And mentoring is the first step toward leadership. Without juniors, you never find out which seniors could become engineering managers.</p>
<p>In my experience, a prolonged absence of juniors correlates with a quiet stagnation. Not a universal law. But a pattern I see repeatedly.</p>
<p>The market confirms the trend. Employment for 22-25 year-old developers has dropped 20% since 2022. In France, IT employment for under-30s fell 7.4% in a single year. Apprenticeship subsidies have been cut. And every company that freezes junior hiring is right for itself. The problem is collective. No juniors today, no mid-levels in 5 years, no seniors in 10. Who reviews AI-generated code in 2031?</p>
<h2>Mentoring remains the foundation. It needs to evolve.</h2>
<p>At Jolimoi, we didn&#39;t have the budget to hire seniors. So we hired juniors. And we trained them.</p>
<p>The mentoring was structured. Half a day per sprint, a detailed training plan from the start, feedback at the end of each cycle. I chose these juniors carefully. Some stayed a long time. I brought several of them up to senior level. And those seniors mentored the next wave. The system fed itself.</p>
<p>Then the first external seniors arrived. They were better than me on many subjects, and that unlocked the team at a level I couldn&#39;t have reached alone.</p>
<p>All of this was before AI. The structure of mentoring (plan, rhythm, feedback, handoff) remains valid. The content needs to evolve.</p>
<p>Three years ago, we trained juniors to code, debug, structure. Today, there&#39;s an additional layer: when to use AI and when to code yourself. Developing judgment WITH the tool, instead of losing it because of the tool. Recognizing that generated code compiles but doesn&#39;t hold. That discernment doesn&#39;t come from prompting. It comes from being accompanied by someone who has it.</p>
<p>Nobody has the complete playbook yet.</p>
<p>Shopify structured its hiring in 3 parts: no AI, AI optional, AI mandatory. The same logic could be applied to mentoring. Microsoft proposes a &quot;preceptor program&quot; where mentoring becomes an explicit organizational goal, not a byproduct of daily work.</p>
<p>On the learning side, a few practices are taking shape. Code first, compare with AI output after: comparison builds judgment, the reverse builds nothing. Use AI to review your own code rather than to write it. Ask AI to explain why, not to give the answer. Some teams go as far as a no-AI day per week, to force the fundamentals. None of this is proven at scale yet. In every team where I discuss it, the intuition converges: preserve the cognitive effort.</p>
<p>I don&#39;t know yet what mentoring will look like in 2 years. Leaving juniors alone with AI doesn&#39;t build engineers. Human mentoring remains the foundation, and AI makes its absence even more costly.</p>
<h2>So what ?</h2>
<p>We talk a lot about what AI will replace. Not enough about what it can&#39;t build: judgment, doubt, the instinct that code compiles but won&#39;t hold, that a spec looks clean but doesn&#39;t solve the right problem. You can&#39;t prompt that. You transmit it.</p>
<p>While most companies are cutting junior positions, Shopify is doing the opposite: scaling from 100 to 1,000+ interns per year. Their bet: juniors who grew up with AI bring a perspective that seniors trained before AI don&#39;t have. As Farhan Thawar puts it: <em>&quot;These folks coming out of schools now are AI native. We wanted to bring those types of people in to reimagine what it looks like to build.&quot;</em></p>
<p>The real question isn&#39;t &quot;do we still need juniors?&quot; It&#39;s &quot;do we still know how to train them?&quot;</p>
<p>And the first answer might be the simplest: let’s start hiring junior again.</p>
<h3>Sources</h3>
<p>(1) Anthropic, <a href="https://www.anthropic.com/research/AI-assistance-coding-skills">How AI assistance impacts the formation of coding skills</a><br>(2) MIT Media Lab, <a href="https://www.media.mit.edu/publications/your-brain-on-chatgpt/">Your Brain on ChatGPT: Accumulation of Cognitive Debt</a></p>
<p>Further reading:</p>
<ul>
<li>Stack Overflow, <a href="https://stackoverflow.blog/2025/12/26/ai-vs-gen-z/">AI vs Gen Z</a></li>
<li>Shopify, <a href="https://coderpad.io/blog/hiring-developers/in-the-ai-era-shopify-is-investing-in-junior-engineers-not-cutting-them/">Investing in Junior Engineers</a></li>
<li>Microsoft / InfoQ, <a href="https://www.infoq.com/news/2026/04/junior-developer-pipeline-crisis/">AI Is Hollowing out the Junior Developer Pipeline</a></li>
<li>César Lizurey, <a href="https://cesar.lizurey.fr/tech/2026/04/09/juniors-ia-centre-formation.html">Recruter des juniors à l'ère de l'IA</a> (données France)</li>
</ul>
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Let it burn - Why the best organizations choose what they don't fix  ]]></title>
            <link>https://www.shapeandship.ai/p/let-it-burn-orga-team-efficiency</link>
            <guid>https://www.shapeandship.ai/p/let-it-burn-orga-team-efficiency</guid>
            <pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Three projects max. The rest burns. Why triage and iteration beat reorgs for improving team efficiency]]></description>
            <content:encoded><![CDATA[<p>We all know this moment. You step back from your organization and list the problems. There are ten of them. Instinct says tackle them all. You launch six in parallel. Three months later, each one is 15% done, none are finished, and everyone is exhausted.</p>
<p>That&#39;s usually when someone suggests a reorg.</p>
<p>In product, the key to a good roadmap is knowing what you won&#39;t build. Same thing for organizations. Pick your battles and own what you leave untouched. Iterate every month instead of restructuring every two years.</p>
<h2>Three projects. The rest burns.</h2>
<p>A recent situation in a data team. An engineering manager who wouldn&#39;t delegate. He reviewed every deliverable and signed off on every decision. All information flowed through him. The team was slowing down, and everyone could see it.</p>
<p>The decision was to not address it now.</p>
<p>Not out of neglect. Because at the same time, two other teams were in the middle of restructuring their delivery. Stabilizing their process was the priority. Coaching that EM would take months and a level of presence I didn&#39;t have yet. Spreading across all three fronts meant treating none of them properly.</p>
<p>But &quot;not going there&quot; doesn&#39;t mean &quot;doing nothing.&quot; I set up a temporary framework to contain the effects. I monitored the signals (satisfaction, turnover, pace) to make sure the situation wasn&#39;t deteriorating. And above all, I set a review date. Not an abandonment. A time-boxed choice.</p>
<p>It&#39;s still the most uncomfortable move in the job. Seeing a problem, understanding it, and deciding now is not the time.</p>
<p>The asymmetry is brutal. When you let something burn, the team sees it and holds it against you. When you spread across six projects and nothing gets done, nobody identifies the cause. Targeted inaction is visible. Scattering is invisible. People always blame the first, never the second.</p>
<p>On the product side, same logic. In a context where delivery capacity was struggling, the temptation was to push the roadmap features to avoid losing time. The opposite of what was needed. Whatever you build, if the team can&#39;t deliver properly, the product impact will be zero. The roadmap can wait. Delivery capacity cannot. I first tested the organizational and delivery changes on smaller scopes before extending them. The ambitious roadmap burned in the meantime. It resumed when the team was ready to carry it.</p>
<p>The longer you work with a team, the more this choice matters. Starting a project without the bandwidth to see it through creates an unkept promise. And an unkept promise does more damage than silence.</p>
<p>There&#39;s a WIP limit that isn&#39;t just cognitive. It&#39;s emotional. Every organizational change touches people, their habits, their sense of security. Carrying that takes attention. And attention has a ceiling.</p>
<blockquote>
<p><em>&quot;There is nothing so useless as doing efficiently that which should not be done at all.&quot;</em> — Peter Drucker <a href="https://www.shapeandship.ai#source-1">(1)</a></p>
</blockquote>
<h2>Understanding what&#39;s actually burning</h2>
<p>Letting it burn doesn&#39;t mean looking away. It means knowing exactly what&#39;s burning and why. And it means coming back to check, regularly.</p>
<p>Triage demands diagnosis. And diagnosis, like debugging, starts with separating the symptom from the root cause.</p>
<p>A tool I use consistently: the knowledge map. You ask the teams, all teams, not just engineering, to map out who knows what. Not the managers. The teams themselves, because otherwise blind spots stay blind spots. Human SPOFs surface quickly. The dev who&#39;s the only one who understands an entire chunk of the architecture, the PM who alone grasps the business domain.</p>
<p>In a company of about forty people where I was supporting the CTO, this mapping revealed the real issue. The CTO was delivering a huge amount. He&#39;d been there from the start, technically strong, involved. But on the knowledge map, his name showed up in almost every box. Technical decisions, architectural knowledge. Every trade-off went through him. The system around him had never evolved to distribute that load. The symptom reported by the team: &quot;we&#39;re slow.&quot; The root cause wasn&#39;t this CTO. It was the organization that had never built autonomy around him.</p>
<p>Different context, same pattern. Sprints consistently slipping. The reflex would be to blame estimation or dev velocity. Digging in, I found that teams were rotating between products every two weeks. Nobody had time to master a scope before switching. The problem was the rotation.</p>
<p>Ron Westrum showed that the highest-performing organizations share one trait: when something goes wrong, the error opens the investigation instead of closing it <a href="https://www.shapeandship.ai#source-2">(2)</a>. You look for the bug in the system. Not someone to blame.</p>
<p>People describe what they experience very well. What they describe is rarely the root cause. To find it, you have to go see them. At every level, not just managers. Skip-levels aren&#39;t a luxury reserved for large companies. Beyond the initial diagnosis, they let you challenge your own choices: the project you decided not to take on six weeks ago, is that still the right call?</p>
<p>Teresa Torres formalized this idea for product with continuous discovery <a href="https://www.shapeandship.ai#source-3">(3)</a>: you don&#39;t interview your users once to build a roadmap, you interview them on an ongoing basis to keep adjusting. The same discipline applies to organizations. You don&#39;t run an audit then execute a plan. You listen continuously and reassess what you&#39;ve chosen to let burn.</p>
<h2>Iterate instead of restructure</h2>
<p>A reorg promises to fix everything at once. It takes six to nine months to produce anything. According to McKinsey, 80% fail to deliver the expected value; 60% reduce productivity <a href="https://www.shapeandship.ai#source-4">(4)</a>.</p>
<p>Iteration works differently. One change. Measured. Adjusted. Then the next.</p>
<p>In a recent context, the sequence looked like this. The first project was bringing dev and product together around a shared understanding of how we work together. Why that one first? Because as long as tech and product didn&#39;t share the same understanding of what we were delivering and why, nothing else could move forward. It was grinding at every sprint, not on execution but on direction. Once that framework was set and stabilized, I tackled how we selected which opportunities to pursue. Then delivery itself. The first weeks were uncomfortable. Stabilization. Observation. Then a new cycle: retrospectives were redesigned so continuous improvement came from the teams, not from above. The documentation and specification process was reworked. Some technical issues could finally be addressed, because the bandwidth was there.</p>
<p>Each change had its moment. None arrived before the previous one was digested.</p>
<p>You don&#39;t ship a feature without a spec. You don&#39;t change an organization without the same discipline. Each change was accompanied by a short doc: the identified root cause, the proposed solutions with their risks, and the success indicators.</p>
<p>A reorg would have tried to do everything at once with a six-month plan. Iteration produced visible results within weeks, and each stabilized result freed up bandwidth for the next <a href="https://www.shapeandship.ai#source-5">(5)</a>.</p>
<p>And the fires we&#39;d let burn? You come back to them. When the current projects are stabilized, you rotate the org backlog and pick up the next one.</p>
<p>Triage makes iteration possible. And iteration ends up treating everything, project after project.</p>
<h2>So what ?</h2>
<p>Letting it burn isn&#39;t comfortable. People will hold it against you. You&#39;ll doubt yourself. Some evenings, you&#39;ll wonder if you made the right call.</p>
<p>There&#39;s a warning signal not to miss. When a fire you chose to leave starts triggering departures, burnout, or an irreversible loss of trust, the triage has failed. Letting it burn has a limit, and that limit is when the fire spreads beyond what you can recover from.</p>
<p>But the question isn&#39;t &quot;is everything okay?&quot; The question is &quot;is what we chose to work on actually moving forward?&quot; If the answer is yes, the triage is working. If the answer is no, it&#39;s time to reassess.</p>
<p>The perfect organization doesn&#39;t exist. The one that knows what it&#39;s letting burn, why, and for how long, moves forward. The others run reorgs.</p>
<p>I recently discussed these topics on the <a href="https://skd.so/SZuyGu?subid=LK">Développeur Experience</a> podcast from <a href="https://www.linkedin.com/in/donatien-d/">Donatien</a>. We talked about team efficiency, decision-making, and navigating uncertainty as a tech leader.</p>
<h3>Sources</h3>
<ol>
<li>Peter Drucker, <em>The Effective Executive</em> (1967)  </li>
<li>Ron Westrum, <a href="https://qualitysafety.bmj.com/content/13/suppl_2/ii22">A Typology of Organisational Cultures</a>  </li>
<li>Teresa Torres, <a href="https://www.producttalk.org/2021/05/continuous-discovery-habits/">Continuous Discovery Habits</a>  </li>
<li>McKinsey, <a href="https://www.mckinsey.com/capabilities/people-and-organizational-performance/our-insights/decision-making-in-the-age-of-urgency">Good Decisions Don't Have to Be Slow Ones</a>  </li>
<li>Will Larson, <a href="https://lethain.com/how-to-evolve-eng-org/">How to evolve an engineering organization</a></li>
</ol>
<p>Further reading:</p>
<ul>
<li>NOBL, <a href="https://nobl.io/changemaker/the-role-of-reorgs-in-organizational-change/">The Role of Reorgs in Organizational Change</a></li>
</ul>
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Devs produce twice as many PRs. Delivery hasn't moved.]]></title>
            <link>https://www.shapeandship.ai/p/devs-produce-twice-as-many-prs-delivery-hasn-t-moved</link>
            <guid>https://www.shapeandship.ai/p/devs-produce-twice-as-many-prs-delivery-hasn-t-moved</guid>
            <pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[ Devs produce twice as many PRs with AI. Delivery hasn't kept pace. The real ROI isn't in the tokens burned, it's in what teams do with the time AI frees up.]]></description>
            <content:encoded><![CDATA[<p>The same picture keeps showing up. The AI dashboard is green. Adoption above 90%. Accepted code trending up. Token consumption climbing every month. Devs say they&#39;re faster.</p>
<p>Delivery, so far, hasn&#39;t kept pace.</p>
<p>At some companies, leaderboards of who burns the most tokens circulate on Slack. At Meta, engineering managers track AI consumption per team (1). Cleo allows up to $2,000/month in tokens per engineer (1). Extreme case: one developer at Anthropic hit$150,000/month on Claude Code (2). Token volume is becoming a proxy for performance. The problem isn&#39;t the budget. The problem is what we&#39;re measuring.</p>
<h2>The &quot;lines of code&quot; of 2026</h2>
<p>Jellyfish studied 7,500 developers over Q1 2026 (3). Between the median developer ($52/month in tokens) and the 90th percentile ($691/month), output doesn&#39;t follow the cost curve. Returns diminish. The highest spender on a team can have the best cost per delivery (4), because they use AI in deep sessions that resolve in fewer iterations. Or not. Raw cost, without tying it to what gets shipped, tells you nothing.</p>
<p>The phenomenon has a name: tokenmaxxing (5). Measuring AI productivity by token consumption. Lines of code in the 2000s, story points in the 2010s, tokens today. Each time, we measure the most visible input and mistake it for a result. We end up getting more of what we measure, and less of what we wanted (6).</p>
<p>Measuring adoption makes sense early on. You check that the tool is being used, that teams are getting comfortable with it. The danger is when a deployment metric becomes the permanent metric of success.</p>
<h2>Devs are faster. The team isn&#39;t.</h2>
<p><a href="http://Faros.ai">Faros.ai</a> measured this across 10,000 developers in 1,255 teams (7). At the individual level, the gains are real: twice as many PRs, nearly 50% more tasks per day. At the company level, no measurable correlation with improved throughput, DORA metrics, or quality.</p>
<p>Individual gains don&#39;t automatically transfer to the org level. The data is still young, but the signal is worth paying attention to.</p>
<p>Code arrives faster, but the rest of the system hasn&#39;t kept up. Code review has become a bottleneck. PRs pile up, sometimes without developers actually improving their review practices (time invested, automated tooling). Coding time has been compressed by AI. Review time, testing time, time to production. None of that has.</p>
<p>Faros qualifies its own results: adoption is recent, 2 to 3 quarters of critical mass, and the numbers may evolve. The signal remains clear. Accelerating production without adapting the downstream pipeline creates a traffic jam further down the road. The only honest path forward is that developers need to spend more time on review, and the team culture needs to evolve to make that possible. Automate as much as you can, yes. But going from no automation to &quot;we don&#39;t review code anymore&quot; is a bet I won&#39;t make. Not until you&#39;ve iterated on your automated review tooling for a few months and calibrated which PRs can skip review, which need a light pass (1 reviewer), and which need the full treatment (2+ reviewers). Learning takes time. Rushing it doesn&#39;t save any.</p>
<h2>The value is second-order</h2>
<p>The most tangible quality improvements don&#39;t come from AI-generated code. Hard to prove at Faros scale. But a pattern I keep seeing, and worth calling out.</p>
<p>Take tests. AI can generate them, and it&#39;s getting better at it. But the biggest gains I observe don&#39;t come from auto-generated tests. They come from the freed-up time that lets developers write more tests themselves, and write them better.</p>
<p>Same logic for refactoring. Long-stalled projects become possible. For mechanical refactoring (renames, method extraction, pattern migration), AI speeds things up. And the reclaimed time lets the team tackle the more structural work. Refactoring becomes more ambitious. Large swaths of technical debt start moving, where before the team was just patching.</p>
<p>Documentation follows the same pattern. AI is very effective at keeping specs up to date, generating first drafts of ADRs (Architecture Decision Records), structuring postmortems. Tasks nobody did because they took too much time for too little immediate reward. Nothing spectacular. Nothing that makes it into a keynote. The kind of quiet progress that changes a team&#39;s trajectory over six months.</p>
<p>AI hasn&#39;t improved quality directly. It&#39;s given teams the time to do what actually improves quality. No token dashboard captures an ADR written, an ambitious refactoring, or a rising test coverage trend. Yet that&#39;s where the difference lies between a team that consumes AI and a team that extracts value from it.</p>
<p>Not an argument for cutting AI budgets. The tools report usage because that&#39;s their business model. Fair enough. It&#39;s up to leaders to choose better instruments to evaluate outcomes.</p>
<h2>Three dashboards, only one that matters</h2>
<p>We&#39;ve known since the DORA metrics: the tool doesn&#39;t predict delivery, and delivery doesn&#39;t predict client value. AI doesn&#39;t change that hierarchy.</p>
<p>The <strong>AI dashboard</strong> measures the tool: tokens consumed, adoption rate, accepted code. Nothing really useful there. </p>
<p>The <strong>delivery dashboard</strong> measures the process: cycle time, lead time, deployment frequency, change failure rate, tokens consumed by feature delivered. Better. But shipping faster isn&#39;t shipping better. I wrote about this in [Cheap to build, costly to keep]: the cost of building has collapsed. The cost of ownership hasn&#39;t. This is where you quickly spot the degradation of metrics like time-to-review. </p>
<p>The <strong>client outcome dashboard</strong> measures value: feature adoption, retention, satisfaction, business impact. The real goal of a successful transformation. Where quarterly objectives and OKRs should focus. Deploying is no longer enough. Improving cadence isn&#39;t enough either. </p>
<p>Next time someone asks &quot;what&#39;s the ROI of AI?&quot;, the answer shouldn&#39;t be an adoption rate. It should be: what have our customers gained that they didn&#39;t have six months ago?</p>
<h2>Sources</h2>
<p>(1) <a href="https://newsletter.pragmaticengineer.com/p/the-impact-of-ai-on-software-engineers-2026">The Impact of AI on Software Engineers in 2026 — Gergely Orosz, The Pragmatic Engineer (April 2026)</a><br>(2) <a href="https://aiproductivity.ai/news/tokenmaxxing-engineers-burn-150k-month-ai-compute/">Tokenmaxxing Is Real: Engineers Now Burn Through $150K/Month in AI Compute — AI:PRODUCTIVITY (2026)</a><br>(3) <a href="https://jellyfish.co/blog/is-tokenmaxxing-cost-effective-new-data-from-jellyfish-explains/">Is Tokenmaxxing Cost Effective? New Data from Jellyfish — Jellyfish (Q1 2026)</a><br>(4) <a href="https://www.vantage.sh/blog/agentic-coding-efficiency">Your Most Expensive Developer Might Be Your Most Efficient — Vantage (2026)</a><br>(5) <a href="https://www.faros.ai/blog/tokenmaxxing">Tokenmaxxing: Why Token Consumption Isn't AI Engineering Productivity — Faros.ai (2026)</a><br>(6) <a href="https://itsmeduncan.com/2026/03/24/tokenmaxxing-is-lines-of-code-thinking-for-the-agentic-era/">Tokenmaxxing: The Costly Mistake in AI Engineering Metrics — Duncan Grazier (March 2026)</a><br>(7) <a href="https://www.faros.ai/blog/ai-software-engineering">The AI Productivity Paradox — Faros.ai (2026)</a></p>
]]></content:encoded>
            <category>code review</category>
            <category>claude code</category>
            <category>dora</category>
            <category>ai</category>
        </item>
        <item>
            <title><![CDATA[Feature Overdose]]></title>
            <link>https://www.shapeandship.ai/p/feature-overdose</link>
            <guid>https://www.shapeandship.ai/p/feature-overdose</guid>
            <pubDate>Mon, 20 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A team ships 11 features in a quarter. The board is happy. Delivery is green. When someone checks the usage data, 3 are actually adopted. The other 8 exist in the product, in the documentation, in the support scope. Not in user habits.]]></description>
            <content:encoded><![CDATA[<p>A team ships 11 features in a quarter. The board is happy. Delivery is green. When someone checks the usage data, 3 are actually adopted. The other 8 exist in the product, in the documentation, in the support scope. Not in user habits.</p>
<p>Not an outlier. A pattern. I&#39;ve seen it in a lot of startups/scaleups I&#39;ve worked with on the product side.</p>
<p>In a <a href="https://www.shapeandship.ai/p/cheap-to-build-costly-to-keep">previous article</a>, I wrote about the invisible technical cost of AI-generated code. Comprehension debt: the code that works but nobody understands. There&#39;s a second cost, on the product side this time. Just as invisible.</p>
<p>AI has accelerated our ability to build. The ability to think through what we build hasn&#39;t kept up. Not just what to remove, but what not to build in the first place.</p>
<h2>Three weeks of dev was also three weeks to think</h2>
<p>Pendo measured the median feature adoption rate across thousands of SaaS products (1): <strong>6.4%</strong>. Out of 100 features a team designs, builds and ships, roughly 6 drive 80% of usage. Even top-10% products don&#39;t exceed 15.6%.</p>
<p>These numbers predate the AI acceleration. When we shipped at human pace, 94% of features were already not adopted. What happens when you ship 2 to 3 times faster?</p>
<p>Three weeks of development was also three weeks during which the PM refined the scope. The designer iterated on mockups. Stakeholders lived with the idea and sometimes realized it wasn&#39;t the right one. The build cycle doubled as a maturation cycle. AI compressed the build time. It compressed the thinking time with it. Not out of negligence. The cycle just accelerated beyond our capacity to absorb.</p>
<p>I call this <strong>feature overdose</strong>. Each feature, taken individually, is a good idea. It&#39;s the accumulation that kills the product.</p>
<p>Every unused feature carries a double cost. On the technical side: maintenance, complexity, code to evolve. The Systemic Additivity Index from the previous article measures exactly that. On the product side: interface surface, documentation, support, onboarding, cognitive load for the user.</p>
<p>There&#39;s also a human cost that dashboards don&#39;t capture. In the startups I&#39;ve worked with on the product side, the warning sign is never a metric. It&#39;s the new hire who takes three months to understand the product instead of three weeks. It&#39;s the customer who uses 10% of the features and pays for 100%. Nobody&#39;s failing. The product has outgrown the people who carry it, and the people who use it.</p>
<h2>Adding costs 2 days. Removing is a decision.</h2>
<p>AI has pushed the cost of building a feature toward zero. The cost of removing one hasn&#39;t moved.</p>
<p>Removing a feature means saying no to someone. The enterprise client who requested it. The PM who spent three months on it. Sales already showed it in a demo. The problem isn&#39;t technical. It&#39;s political. That&#39;s why nobody does it.</p>
<p>It&#39;s not just a lack of courage. It&#39;s a documented cognitive bias. In 2021, Adams, Converse, Hales and Klotz published a study in <em>Nature</em> across eight distinct experiments (2). The most telling protocol: participants must stabilize a Lego structure to support a weight. The optimal solution is to remove a piece. Removal is free. Each added piece costs $0.10. Result: <strong>59% of participants add anyway</strong>. When explicitly reminded that removal is free, 61% subtract, and earn 10% more. The researchers conclude that people don&#39;t reject subtraction. They don&#39;t see it. It doesn&#39;t exist in their solution space.</p>
<p>In product management, the additive bias is structural. Roadmaps list what we&#39;ll build, never what we&#39;ll remove. OKRs measure what we shipped, not what we simplified. AI makes the asymmetry worse: when building cost three weeks, refusing to build was easy to justify. &quot;We don&#39;t have the bandwidth.&quot; When building costs two days, every refusal seems irrational. &quot;It&#39;s only two days, might as well do it.&quot; The last natural guardrail of prioritization, dev time, is gone. Nothing has replaced it.</p>
<p>Something needs to. The replacement isn&#39;t a process. It&#39;s the product vision. When dev time no longer filters, vision has to. If a feature doesn&#39;t serve the vision, it doesn&#39;t get built, even if it only takes two days. &quot;We could build it&quot; is not a reason. &quot;It serves where we&#39;re going&quot; is. Everything else is noise, no matter how cheap it is to ship.</p>
<p>Feature overdose has two sources. Building what shouldn&#39;t have been built. And not removing what was. The first one is harder to see because it looks like productivity. The team shipped fast, the board is happy, the backlog is shrinking. But shipping fast in the wrong direction doesn&#39;t clear the backlog. It fills the product.</p>
<p>As with code, you can measure the asymmetry. I propose the <strong>Net Feature Ratio</strong>: the number of features added divided by the number of features removed per quarter. It&#39;s the product-side mirror of the SAI (Systemic Additivity Index) from the previous article, the insertion/deletion ratio on the code side.</p>
<pre><code class="language-javascript">Net Feature Ratio = features added / (features removed + 1)
</code></pre>
<p>Below 3, the team builds and prunes: for every 3 features added, at least 1 is removed or consolidated. That&#39;s a sustainable pace. Above 10, quarter after quarter, the product bloats without counterbalance. The interface surface grows, user cognitive load rises, but nothing comes out.</p>
<p>By feature, I mean a user-facing capability: a screen, a workflow, an export mode. Not a bugfix or a performance improvement. What matters is what the user sees and has to understand.</p>
<p>Not an industry standard. It&#39;s a diagnostic tool I&#39;m starting to use, like the SAI in the previous article. It takes on meaning over time. A product launch quarter will naturally have a high ratio. The signal is the trend.</p>
<p><strong>How to measure it</strong>: a spreadsheet with two columns. Features added, features removed (or consolidated, or hidden). Updated each release. If the right column has been empty for three quarters, that&#39;s the signal.</p>
<h2>What velocity dashboards don&#39;t show</h2>
<p>The NFR measures the asymmetry at the roadmap level. But product bloat has other signals, often earlier ones.</p>
<p><strong>Time-to-value for new users.</strong> The time between signup and the moment the user gets value from the product. If this grows release after release, while the product is supposed to be improving, it means each added feature is complicating the path to value instead of shortening it.</p>
<p><strong>The onboarding coverage ratio.</strong> If onboarding covers only 20% of the product, it&#39;s an implicit admission: 80% of the surface isn&#39;t deemed important enough to show a new user. But it still exists. In the interface, in the documentation, in support.</p>
<p><strong>The number of features removed per quarter.</strong> Not a ratio. The raw count. In most organizations, this number is zero. Not since last quarter. Since forever. When I ask this question in the startups I work with, the silence says everything. Pendo frames feature removal as an &quot;innovation event&quot; (3), an act that frees engineering resources for higher-impact work. But for most teams, it&#39;s not even a mental category.</p>
<p><strong>The percentage of shipped features tied to a strategic objective.</strong> Look at what you shipped last quarter. How much of it directly serves the product vision? If 40% doesn&#39;t connect to any strategic axis, that&#39;s noise that made it through the filter. The feature may be useful. It may even be adopted. But if it doesn&#39;t serve where the product is going, it&#39;s dilution.</p>
<p>These signals don&#39;t require new tooling. Time-to-value is in your onboarding analytics. The coverage ratio is in your documentation. The removal count is in your roadmap. Or rather, in its absence. The strategic alignment ratio is in your roadmap too, if you&#39;re honest about it.</p>
<hr>
<blockquote>
<p><em>&quot;People don&#39;t reject subtraction. They just don&#39;t think of it.&quot;</em><br>— Adams, Converse, Hales &amp; Klotz, Nature (2021)</p>
</blockquote>
<p>Feature overdose doesn&#39;t come from a bad decision. It comes from a hundred good decisions that nobody counted. Adding is now trivial. Removing is just as hard as it ever was. The gap widens with every sprint.</p>
<p>Four signals to watch:</p>
<ul>
<li><strong>The Net Feature Ratio</strong> on your roadmap. Above 10 quarter after quarter, the product is bloating.</li>
<li><strong>Time-to-value</strong> for new users. If it&#39;s growing, complexity is winning.</li>
<li><strong>The number of features removed</strong> per quarter. If it&#39;s been zero for a year, the question isn&#39;t even being asked.</li>
<li><strong>Strategic alignment</strong> of shipped features. If a significant share doesn&#39;t serve the vision, you&#39;re building fast in the wrong direction.</li>
</ul>
<p>The missing piece isn&#39;t a tool. It&#39;s a habit. Two questions that should be part of every sprint: &quot;what do we remove?&quot; and &quot;should we even build this?&quot; Teams that ask both free up clarity, bandwidth, and absorption capacity for the features that actually matter.</p>
<p>The <a href="https://www.shapeandship.ai/p/cheap-to-build-costly-to-keep">previous article</a> covered the cost on the code side. This one covers the cost on the product side. Together, they form the true price of a feature in the age of AI.</p>
<hr>
<h2>Sources</h2>
<p>(1) <a href="https://www.pendo.io/pendo-blog/feature-adoption-benchmarking/">Feature Adoption Benchmarking — Pendo (2025)</a><br>(2) <a href="https://www.nature.com/articles/s41586-021-03380-y">People systematically overlook subtractive changes — Adams, Converse, Hales &amp; Klotz, Nature (2021)</a><br>(3) <a href="https://www.pendo.io/pendo-blog/how-to-effectively-remove-and-retire-a-feature-from-your-product/">How to Effectively Remove and Retire a Feature — Pendo</a><br>(4) [Cheap to build, costly to keep — ShapeAndShip (Apr 2026)](../ia-qualite/3 - Cheap to build, costly to keep.md)</p>
]]></content:encoded>
            <category>feature</category>
            <category>product</category>
            <category>ai</category>
        </item>
        <item>
            <title><![CDATA[Cheap to build, costly to keep]]></title>
            <link>https://www.shapeandship.ai/p/cheap-to-build-costly-to-keep</link>
            <guid>https://www.shapeandship.ai/p/cheap-to-build-costly-to-keep</guid>
            <pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[AI has broken the cost of writing code. It hasn't touched the cost of owning it. How to measure what your velocity dashboards won't show you.]]></description>
            <content:encoded><![CDATA[<p>Over 40% of committed code is now AI-assisted (1). AI has broken the cost of writing code. It hasn&#39;t touched the cost of owning it.</p>
<p>Velocity metrics look great. We ship more, faster. Everything is green. That&#39;s worth questioning.</p>
<h2>There are two prices</h2>
<p>AI has collapsed the price of writing code to near zero. Features that used to take a week now ship in two days, a major lever for any scale-up.</p>
<p>But writing code was never the most expensive part. What costs is maintenance and evolution.</p>
<p>The first large-scale empirical studies converge on the same finding: AI-generated code introduces 1.7x more issues than human code. Without guardrails, that compounds: maintenance costs reach 4x traditional levels by the second year (1). These numbers come from an Ox Security report relayed by InfoQ. Ox sells application security tooling, so they have skin in the game. But other studies point in the same direction (4)(6), and the pattern matches what I observe in the field.</p>
<p>Why? Because AI is additive by default. Take a common case: an endpoint returns a badly formatted date. An experienced developer would trace back to the parser and fix the format at the source. AI adds a <code>.toISOString()</code> in the controller, a <code>sanitizeDate()</code> wrapper in the service, and a test that validates the workaround. The bug is &quot;fixed.&quot; Three layers of code added, zero lines removed. The root cause is still there.</p>
<p>A junior developer would make the same mistake. The difference is scale. AI produces this kind of palliative on every PR, in every module, without anyone systematically pushing back. What used to be a coaching moment becomes a systemic codebase problem.</p>
<p>GitClear&#39;s longitudinal study across millions of lines of code (6) quantifies the shift: the share of changed lines associated with refactoring dropped from 25% in 2021 to under 10% in 2024. In the same period, code duplication rose from 8.3% to 12.3%. AI amplifies the pattern: it generates new code rather than restructuring what exists. (On greenfield projects, high addition rates are expected. The signal matters on code in maintenance, over time.)</p>
<p>Another signal worth tracking: <strong>code churn</strong>, the percentage of code rewritten within two weeks of its creation. The same study (6) measured churn rising 84% between 2020 and 2024, from 3.1% to 5.7%. The period also covers post-COVID shifts and the Great Resignation, so AI adoption isn&#39;t the only factor. But the correlation is strong enough to warrant monitoring. A feature that ships in two days but gets rewritten the following sprint is not a velocity gain. It&#39;s a deferred cost wearing a speed label.</p>
<p>These are the indicators that separate teams where AI builds from teams where AI just adds.</p>
<h2>The code works. Nobody understands why.</h2>
<p>Addy Osmani named this phenomenon: comprehension debt (3). The code runs. Tests pass. Syntax is flawless. But no one on the team can explain how it works. The team merged code it didn&#39;t write, didn&#39;t truly read, and couldn&#39;t reproduce without AI.</p>
<p>This has nothing to do with classic technical debt. There&#39;s nothing to refactor. The code is clean. It&#39;s just that nobody carries it in their head.</p>
<p>When a production incident hits, resolution time increases. The team discovers the implementation in real time, under pressure. An empirical study of 304,000 commits confirms it (4): developers place excessive trust in AI-generated code and merge it without thorough validation. Issues frequently go unfixed.</p>
<p>This is where <strong>bus factor</strong> becomes a leading indicator. If AI writes code that only AI can explain, the bus factor of affected modules tends toward zero. Not because a single person holds the knowledge, but because no one does. In a post-mortem, this surfaces as &quot;nobody knew this code existed,&quot; which is worse than &quot;only one person knew.&quot;</p>
<p>One way to detect it: track the MTTR Drift (2), the deviation of Mean Time To Recovery from its pre-AI baseline:</p>
<pre><code class="language-javascript">MTTR Drift = (MTTR post-AI - MTTR pre-AI) / MTTR pre-AI
</code></pre>
<p>The proposed thresholds (2) are as follows. Between -10% and +10%, the team has internalized the generated code. Above +30%, it&#39;s a signal worth investigating. MTTR is a noisy indicator. A single major incident can skew a quarter, so measure it as a rolling median over at least three months. The drift can have other causes (turnover, infrastructure changes), but if it correlates with AI adoption and nothing else explains it, comprehension debt is a serious hypothesis.</p>
<p>If the drift is real, the intervention point is clear. Code review is the last moment where the team can take ownership of code it didn&#39;t write. Automating convention and pattern checks (via hooks or review bots) frees human attention for business logic and architecture decisions. A hook that blocks PRs over 400 lines unless the author provides a section-by-section breakdown helps keep reviews at a human scale.</p>
<h2>Measuring real impact, not velocity</h2>
<blockquote>
<p><em>&quot;The faster you go, the further ahead you need to look.&quot;</em><br>— Todd Gagne, The Barrels Paradox</p>
</blockquote>
<p>Lines of code generated, number of PRs, completion speed: these are production metrics. They measure volume. They say nothing about the cost of what we produce.</p>
<p>The obvious starting point is DORA metrics before and after AI adoption. If deployment frequency goes up but change failure rate does too, we&#39;re not going faster. We&#39;re breaking more often. One study measured a +30% increase in change failure rate within 90 days of AI adoption (1). Part of that may reflect the learning curve of new tooling rather than a structural problem.</p>
<p>Three questions to ask: does the AI produce code we keep? Can the review pipeline absorb the volume? Does the architecture hold?</p>
<p>On the review pipeline specifically: <strong>review cycle time</strong> on AI-assisted PRs versus manual ones is a revealing metric. A study of over 8,000 AI-agent PRs (7) shows that 35% are never merged, either closed or left to rot. Merge rates vary from 42% to 82% depending on the tool. If a third of AI-generated PRs never land, the velocity gain measured at commit time is absorbed downstream. The bottleneck moves from writing to reviewing.</p>
<p>In his paper (2), Nadarajah applies this reasoning to two scenarios with the same tool and the same team, over a one-month cycle. Without guardrails: net impact of -4,200. The team spends its time cleaning up. With the right safety nets: net impact of +5,780. Same tool, opposite outcome.</p>
<h2>The guardrails make the difference</h2>
<p>The numbers tell part of the story. There&#39;s also a human cost that metrics don&#39;t capture. Seniors spend their days reviewing code they didn&#39;t write and understand less and less. Experienced developers lose touch with their own codebase. In the teams I work with, that&#39;s often the first warning sign: not a metric going off, but a tech lead saying &quot;I don&#39;t recognize the code anymore.&quot;</p>
<p>theThe answer is to give AI the right context. With one of my clients, we formalized architecture conventions in tool-readable specs that the AI loads before generating code, set up pre-commit hooks to block known anti-patterns, and invested in building shared skills across the team so that everyone can evaluate what the AI produces. The hooks catch issues before they reach review; the shared understanding catches everything else.</p>
<p>It&#39;s too early to measure the impact. The setup is recent. But the logic holds: give AI the context to produce aligned code from the start, rather than fixing it after the fact. I&#39;ll detail the full pipeline (specs, hooks, task templates, review automation) in a follow-up article.</p>
<p>AI also reduces certain types of bugs, improves test coverage on boilerplate, and enables small teams to deliver what used to take months. The risks described here are real, but they exist alongside genuine gains.</p>
<h2>What to watch</h2>
<p>If you do one thing Monday morning: pull the insertion/deletion ratio on your three most active repos for the last quarter. If it exceeds 10:1, start asking which PRs contribute the most.</p>
<p>Signals to track over time:</p>
<ul>
<li><strong>Code churn</strong> within 14 days. Extractible from git, no vendor dependency. If AI code is rewritten more than human code, the speed is illusory. (Tagging AI vs. human commits requires convention: commit message tags or Copilot metadata.)</li>
<li><strong>Refactoring ratio</strong> on your repos. If it drops below 10% of changed lines, the codebase is bloating. The 10% threshold comes from GitClear&#39;s methodology (6). Your mileage may vary with different tooling.</li>
<li><strong>MTTR Drift</strong> (rolling median, 3+ months). Above +30%, the team understands less of what it ships. Threshold from (2), not an industry standard.</li>
<li><strong>Review cycle time</strong>, AI PRs vs. manual. If AI PRs take longer to merge, the bottleneck moved. Normalize by PR size to account for size differences.</li>
<li><strong>Change failure rate</strong> before and after AI. If it&#39;s going up after the adoption curve has stabilized, we&#39;re breaking faster than we&#39;re building.</li>
</ul>
<p>Today&#39;s acceleration becomes tomorrow&#39;s bottleneck when nobody looks beyond the velocity dashboards. That&#39;s a leadership choice.</p>
<hr>
<h2>Sources</h2>
<p>(1) <a href="https://www.infoq.com/news/2025/11/ai-code-technical-debt/">AI-Generated Code Creates New Wave of Technical Debt — InfoQ / Ox Security (Nov 2025)</a><br>(2) <a href="https://www.shapeandship.aiEtude_IA_Metrics.pdf">The Velocity Mirage: The Agentic Impact Framework — Mag-Stellon Nadarajah (March 2026)</a><br>(3) <a href="https://medium.com/@addyosmani/comprehension-debt-the-hidden-cost-of-ai-generated-code-285a25dac57e">Comprehension Debt: The Hidden Cost of AI-Generated Code — Addy Osmani (March 2026)</a><br>(4) <a href="https://arxiv.org/abs/2603.28592">Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild — arXiv (March 2026)</a><br>(5) <a href="https://wildfirelabs.substack.com/p/the-barrels-paradox-why-ai-makes">The Barrels Paradox: Why AI Makes Leadership More Human, Not Less — Todd Gagne (Feb 2025)</a><br>(6) <a href="https://www.gitclear.com/ai_assistant_code_quality_2025_research">AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones — GitClear (2025)</a><br>(7) <a href="https://arxiv.org/html/2602.00164">Why AI Agent-Involved Pull Requests Remain Unmerged — arXiv (Feb 2026)</a></p>
]]></content:encoded>
            <category>code review</category>
            <category>technical debt</category>
            <category>claude code</category>
            <category>dora</category>
            <category>ai</category>
            <category>code quality</category>
            <category>leadership</category>
        </item>
    </channel>
</rss>