From Copilot to harness engineering

One dev on Cursor, another trying Claude Code, a third who plugged in Copilot. Going from there to a team that builds and maintains its own AI harness is a different problem. The gap is smaller than it looks.

Top-down creates what it claims to fix

"But we're being told to put AI everywhere and it's not even useful." I've heard this sentence word for word in one team. And variations in others. "Use AI" as an OKR is bound to fail for the same reason "be agile" as an OKR was bound to fail ten years ago. You're asking people to adopt something they haven't tried, don't see the value in, with a deadline.

Top-down is optimized for reporting, not for change. You can show adoption rates in a quarterly review. You can present a training plan to the board. It's reassuring, measurable, presentable. But it doesn't produce autonomy. It produces dependency. The team waits for the next directive instead of developing its own capacity to adapt.

Three months under the radar

I apply the same method on every transformation I lead, AI or otherwise. The framework stays the same, the subject changes. On the most recent one, an AI transformation, the team shifted in two days. But those two days needed three months of preparation.

Open call for volunteers. A group formed naturally. From there, experiments in small groups of two, three, sometimes four. Claude Code, Cursor, automated reviews, models we ran internally. Each experiment lasted about a month, with a set format: scope, KPIs, logbook, presentation at the end. Some spilled over into the next one, and that was expected.

In parallel, we had to build a framework with the CISO so experimentation was even possible. Without that, every initiative would have been blocked by a "we haven't approved that tool."

In three months, around thirty experiments completed.

Why pairs? Because the alternative is knowledge that stays in one person's head. Experimenting alone means learning for yourself. Experimenting with someone else means having to articulate what you're doing, why it works or why it doesn't. Verbalization forces clarification. And it produces a narrative others can pick up. Someone coming back from a solo experiment says "it was fine" or "it didn't work." A pair comes back with an analysis, trade-offs, a structured opinion. The logbook reinforces this: what's written down is transferable. What's only experienced stays anecdotal.

This work doesn't fit in any OKR, any dashboard. It's a three-month investment that looks like nothing from the outside. Most organizations pick the three-day training plan because it's presentable. The underground work, nobody wants to fund it.

Yet that's where autonomy gets built. In a team's capacity to explore on its own, to fail, and to share what it learns.

The tipping point

Two days, in person, in Paris. Remote teams came in for the occasion. The energy doesn't happen the same way over video. When you're together, you talk, you watch, you move from one group to another.

40 minutes of presentation to cover the basics: the context files that give the model knowledge of the project, and skills, reusable instructions that the team enriches over time. That's the harness. Then teams pick their subjects, form pairs or small groups, and build. Each group produces something shareable, a skill or a rule the rest of the team can use the next day. The goal: that each person leaves with the ability to evolve their own tools, not with a training they'll forget. And that committing to your own skills, and to other teams' skills, becomes a habit.

Total freedom, not quite. When you've seen enough transformations, you know where it's going to break. QA was going to become a bottleneck. Before the two days, the developers most comfortable with AI had received a clear steer: this one needs to be addressed first. They paired up with the QA team members and produced the most effective skills of the two days. One of them takes a product specification, pulls it apart, and exposes every gap: ambiguities, uncovered edge cases, implicit assumptions. The product manager who tested it called it "staggering."

When it lives without you

A month later, the team had accelerated further. Around forty skills created. Regular additions and improvements. Teams commit skills to each other's repos. Automations, code rules, things nobody was talking about before the two days.

Today, some teams are working on automating entire pipelines: from a stated requirement to a PR with a test plan implemented, no manual intervention. Developers don't disappear from the process. They change roles. They become the guardians of the guardrails and of quality. They decide what the AI is allowed to do, and what it isn't.

It's not adoption that matters. It's ownership. The signal that a change has taken hold, AI or otherwise, is when the team evolves the practice without being asked. When the leader is no longer necessary for it to move forward.

It's never finished. The harness gets built continuously, skills evolve, guardrails get adjusted. The day you stop evolving them, you start falling behind again. There's still a path to walk before we get to a real dark factory, but that's for another article.

From Copilot to harness engineering

Top-down creates what it claims to fix

Three months under the radar

The tipping point

When it lives without you

Stay in the loop

0 Comments