Why your AI training isn't showing up in your engineering velocity

Jonny Davis

Strategic Account Executive, Andela

Jun 25, 2026

10 min

Why your AI training isn't showing up in your engineering velocity

A study that recently rocked the engineering world found a disconnect between senior engineer perception of efficiency and actual output timelines.

The METR study, which followed 16 open-source programmers who completed 246 tasks, found that these developers believed that using AI tools helped them perform 20% faster. However, they actually worked 19% slower. That’s concerning because engineering teams have bought coding agent licences across the board over the last 18 months.

Over that time, uptake has been uneven, to say the least. And where there’s strong adoption, the velocity gains often fall short of what the spend implied; frustrating technical teams and CFOs, alike. And mirroring the findings of the METR study.

The promised ROI: Tools raise output.

The reality: A 2024 field study of more than 4,800 developers at Microsoft, Accenture, and a Fortune 100 company found a 26% increase in tasks completed with GitHub Copilot. The gain fell mostly to junior developers, largely because the tool stands in for capability that junior developers have yet to build. Senior developers, meanwhile, gained less and were less likely to accept the tool's suggestions. Engineers who already had years of exercising judgment under their belts employed the tool the least.

The study revealed that organizations with mature practices and high DORA scores saw the same deterioration as everyone else. The gain is counted at the point of authorship. The cost lands downstream, where it is not set against the throughput it offsets. That muddies the waters from both an engineering and an operations standpoint.

A separate analysis of 211 million lines of code found duplicated code blocks rose eightfold in 2024. Notably, efforts to refactor fell from a quarter of changed lines to under a tenth.

Precedence to lean on

Even before AI, people used Google differently. A measure of how good someone was always traced back to how well they had learned to use Google to reach desired or relevant results. Similarly, prompt engineering has to be taught and trained. It can't be picked up in a vacuum. And junior engineers are more AI-native and elastic. Two characteristics that improve the quality of prompt engineering. But skill level doesn’t always translate to prompting quality.

In other words, two people at the same skill level can use the same tool and get completely different results. The results depend significantly on how well they prompt it.

A complicating factor: High quality outputs carry a cost, which shows up after the work is merged. For context, Faros AI tracked two years of telemetry from 22,000 developers across more than 4,000 teams. Task throughput per developer rose by a third. Incidents per pull request rose 242.7%, and time in code review rose 199.6%, as senior engineers worked through AI-generated code that reads as idiomatic and carries structural failures beneath the surface.

Three levels

Level one is standardization: the coding agent is on every machine.

Level two is interface competence: the team knows how the product works, through vendor programs, documentation, peer learning. Engineers learn the interface and the capabilities. This takes weeks and repeats with every tool upgrade.

Level three is taste: the judgment to know when to trust the output and when to override it; how to structure a workflow so poor suggestions stay contained; how to frame a problem so the tool's reasoning holds up; how to evaluate under production pressure.

Most organisations reach level two.

What level three does

Engineers who are properly trained on a tool rarely complain about hallucinations. In fact, they know how to use it to avoid them. As a result, reaching level three is likely less of a "model problem" and more of a skill gap.

An engineer with taste frames the task before reaching for the tool, reads the output for structural failure rather than surface correctness, and knows when to discard a suggestion. This is the work the senior engineers in the telemetry were doing by hand, after the code was merged. Level three moves it upstream, before the cost is incurred.

What it takes to build

It takes three things to build team-wide adoption that generates outcomes and velocity that make investments in AI worthwhile, in our experience.

Sustained practice on codebases with real consequences. Production systems where poor judgment creates visible problems: slow reviews, bugs that surface under load, rework. The engineer sees the result when an evaluation was wrong, and adjusts.

Feedback that examines the judgment call alongside the syntax. Architectural decisions explained and defended. Over weeks and months, this turns isolated decisions into recognisable principles.

Time alongside engineers who already hold the judgment, on live decisions about what code should exist, how a system should be structured, when something is ready to ship. The developing engineer sees the reasoning, the trade-offs, the consequences.

Why it holds and why we hold our position

Training leads to outputs which improves productivity and velocity. When someone learns how to get real outcomes from a tool, their desire to use it skyrockets. Tool-specific knowledge holds for months as the products move. Taste carries across tool cycles. An engineer with taste onboards to a new tool in days, because the evaluation framework holds. The investment compounds, rather than resetting with each upgrade.

At Andela, we believe that durable velocity, measured net of rework and incidents, comes from level three. We also live that reality and work to train our team members to reach that level. Externally, our programs take engineers who already use the tools and centre them on the judgment that decides whether the output is worth keeping. We see the work change: cleaner structure, fewer problems caught late, decisions the engineer can explain and defend.

We help teams go beyond surface-level adoption and casual training to full-scale adoption with every engineer delivering outcomes that travel through all three levels. Find out more here.

‍

Jonny Davis

Strategic Account Executive, Andela