> Research Report · April 2026

The Hidden Cost
of AI-Generated Code

Why Faster Code Writing Is Making Software Worse

$2.41T

Cost of poor
SW quality

CISQ estimate, U.S.

-7.2%

Delivery stability
per 25% AI adoption

DORA 2024

15-25%

Of AI-generated code contains security vulnerabilities

Multiple studies

7.5%

CS grad unemployment - higher than fine arts grads

Stack Overflow, 2025

We’ve solved the wrong problem. And the bill is coming due.

Article By · written with AI assistance

Maksym Klymyshyn

Software Supply Chain Enthusiast | Engineering Leader

Alex Shulika

Director of Strategic Projects | CEO Office | ex-McKinsey

Maxence Ducher

Graphic & Motion Designer | Expert Production Marketing

What Are We Actually Talking About?

In February 2025, Andrej Karpathy dropped a tweet with a term that defined a massive shift already underway in software engineering: “vibe coding”^[12]. The practice of describing what you want in plain English and letting an LLM generate the code while you “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” The concept of focusing on creative thinking while having a machine do all the actual work sounded fantastic. And not even two years later, it’s becoming clear that this wonder-tool may be creating the largest accumulation of technical debt in software history.

This isn’t an anti-AI post. AI tools are amazing for quick explorations or for rubber-ducking ideas at 2 AM when no human colleague would tolerate your half-baked questions. But there’s a big difference between using AI as a thinking partner and outsourcing your engineering judgment to a probabilistic text predictor. That difference - the gap between tool and crutch - is what this piece is about. Because the data is in. And it’s not telling the story the hype cycle promised.

How Did We Get Here?

The promise of vibe-coding was simple: you give a sharp prompt, AI writes code faster than any human can, features ship sooner, customers get happier, and companies spend less. The promise was so seductive that AI-generated code started spreading like wildfire. And though Karpathy initially used the term vibe-coding in the context of "throwaway weekend projects", vibe-coding started becoming a common tool for any sort of project by individual developers and massive corporations alike.

GitHub CEO Thomas Dohmke stated that Copilot writes an average of 46% of code in files where it's enabled. At Meta's LlamaCon in April 2025, Satya Nadella told Zuckerberg that 20-30% of code in Microsoft's repos is AI-written. During Google's earnings call in October 2024, Sundar Pichai first said more than 25% of new code at Google was AI-generated; by April 2025, that number had risen to over 30%. Y Combinator's Winter 2025 batch tells perhaps the most dramatic story: 25% of startups had codebases that were 95% AI-generated, hitting revenue milestones with teams so small they appeared to be rewriting the rules of startup economics.

The venture capital world embraced the revolution. The narrative was irresistible: smaller teams, faster iteration, lower burn rates, the democratization of software creation. Every founder who couldn't tell a for-loop from a foreach was suddenly a "technical founder." Every PM who'd been told "that's a two-sprint feature" was suddenly hearing "give me twenty minutes."

But the narrative had a blind spot the size of a galaxy: it optimized entirely for the creation of code and ignored everything that happens after you hit deploy. Software needs to be maintained, debugged, secured, documented, and understood by the humans who are responsible for it at 3 AM when it breaks.

AI-Generated Code at
Major Tech Companies

% of code written or generated by AI (2024-2025)

∞

Cheaper code = vastly
more code, not less

JEVONS PARADOX

50%

Lose 10+ hours/week
to non-coding overhead

ATLASSIAN 2025

16%

Of a developer’s week
is spent coding

ATLASSIAN 2025

99%

Report time savings
from AI tools

ATLASSIAN 2025

The Paradox of Productive Friction

Here's the thing everybody understands but nobody wants to talk about: the difficulty of writing software was a feature, not a bug. When implementation was expensive and slow, teams were forced to make hard decisions. They debated trade- offs. They killed mediocre ideas early because they couldn't afford to build all of them. The cost of engineering acted as a natural filter - a forcing function for product strategy that made leaders ask "is this actually worth building?" before they asked "can we build it?"

With AI-generated code being just one prompt away, that filter is gone now. When the cost of "trying it" drops to near zero, discipline collapses. I've watched teams ship half-baked features not because anyone believed in them, but because the AI made it so easy that saying "let's just see" felt costless. It wasn't. Every one of those experiments became a line item in the maintenance backlog.

As Atlassian's internal data shows, almost all developers (99%) now report time savings by using AI tools, with 68% saving more than 10 hours a week! Moreover, engineers spend only 16% of their week coding^[6], and coding is not a friction point for developers. Developers are losing valuable time to non-coding tasks: 50% report losing 10+ hours per week, and 90% lose 6+ hours or more, largely due to organizational inefficiencies: coordination, debugging, testing, deployment, and documentation - areas where AI doesn't eliminate work, it multiplies the number of cycles teams have to run.

This is the Jevons Paradox playing out in real time. In 1865, William Stanley Jevons observed that more efficient steam engines didn't reduce coal consumption - they increased it, because cheaper energy unlocked entirely new categories of demand. The same dynamic is now consuming software. As analysis applying Jevons to AI coding explains: as code becomes cheaper to produce, we don't produce less of it. We produce vastly more. The surface area of software expands - more internal tools, more automations, more microservices, more "AI-powered" features bolted onto products that didn't need them - multiplying complexity at every layer.

Another developer seconds this message: "I built more in two months with agents than in the previous year. I used almost none of it... agents amplify clarity. If you know exactly what you want, they're a force multiplier. If you don't, they're an expensive way to generate garbage you'll never use."

The METR Study: You Think You’re Faster. You’re Not.

In July 2025, METR published a randomized controlled trial with 16 experienced developers working on real issues in their own repositories - large open-source projects averaging over one million lines of code and 22,000+ stars. The result was startling: developers using AI tools took 19% longer to complete tasks than those working without them.^[1] But here's the truly unsettling part. Before the study, developers estimated AI would speed them up by 24%. After using AI and actually being slowed down, they still believed it had made them 20% faster. The gap between perception and reality was 39 percentage points.

Let that sink in. Experienced developers - people who've spent years building the instincts to evaluate their own performance - were confidently wrong about whether their tools were helping or hurting. It's interesting to note that developers in the study accepted <44% of AI generations and 9% of their time was spent reviewing and cleaning AI outputs.^[1]

METR RCT · 16 developers · 246 tasks · repos averaging 1M+ LoC

The Perception–Reality Gap

Though the METR's follow-up study shows a productivity gain (~18% for the same subset of developers and ~4% for new cohort) vs productivity loss (19%) in the original one, it tells another interesting story: an increased share of developers said they would not want to do 50% of their work without AI, even though study pays them $50/hour to work on tasks of their own choosing (to be fair, the first study was paying $150/hour). Moreover, 30% to 50% of developers were choosing not to submit some tasks because they did not want to do them without AI.

Though the scope of the study was small, the results are quite telling: AI tools are not indispensable, but dependency level is deep.^[2]

Code Quality Metrics Over Time (2020–2024)

Refactored vs. Copy/Pasted as % of changed lines

GitClear: 211 Million Lines of Evidence

GitClear’s 2025 research analyzed 211 million changed lines^[3] from Google, Microsoft, Meta, and enterprise repos. Refactoring plummeted from 25% in 2021 to under 10% in 2024. Copy/pasted code rose from 8.3% to 12.3%. 2024 was the first year cloned lines exceeded refactored lines. Duplicate blocks (5+ lines) increased eightfold^[3]. Code churn - new code revised within two weeks - climbed from 3.1% to 5.7%, indicating premature or low-quality commits reaching production.

When a bug is found in duplicated code, developers must find and fix every copy - and in a large codebase, some copies will inevitably be missed. AI tools make it easy to insert new blocks by pressing tab, but they rarely propose reusing an existing function, because their context window is limited to roughly 10 files.

DORA Reports: The Stability Problem

Google's 2024 DORA report - based on responses from over 39,000 professionals^[4] - confirmed the pattern. A 25% increase in AI adoption was associated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability. As one DORA analyst observed, the fact that individuals feel more productive with AI while the system has less throughput and stability feels quite contradictory.

The 2025 DORA report found that while AI's relationship with throughput had improved, the instability problem persisted. As the researchers wrote: "AI accelerates software development, but that acceleration can expose weaknesses downstream. Without robust control systems, an increase in change volume leads to instability." The RedMonk analysis of the 2025 report put it even more bluntly: "AI adoption not only fails to fix instability, it is currently associated with increasing instability.

Translation: we're shipping faster and breaking more things. And the breakage is not random - it's structural.^[4]^[5]

-7.2%

Delivery stability per 25% AI adoption increase

2024 DORA

-1.5%

Throughput per 25% AI adoption increase

2024 DORA

75%

Report individual productivity gains from AI

PRODUCTIVITY

39%

Report little or no trust in AI-generated code

TRUST

DORA introduced the “Vacuum Hypothesis”^[5]: AI enables developers to complete work faster, but instead of dedicating reclaimed time to high-value activities, it gets absorbed by lower-value tasks.

The instability likely stems from AI enabling larger, less manageable change batches and from developers over-relying on AI during code review. The 2025 report emphasized that the value of AI is unlocked not by the tools themselves, but by surrounding practices - strong automated testing, mature version control, and fast feedback loops.

Stack Overflow: Rising Usage, Falling Trust

The paradox of the moment is captured perfectly in Stack Overflow's 2025 Developer Survey of 49,000+ developers across 177 countries. Usage of AI tools rose to 84%, with 51% of professional developers relying on them daily. But trust plummeted: only 33% of developers trust AI-generated output, while 46% actively distrust it - a sharp reversal from 2023-2024, when positive sentiment exceeded 70%.

Two-thirds of developers reported struggling with AI solutions that are "almost right, but not quite" - close enough to look correct, far enough off to waste hours of debugging. And 45% said debugging AI-generated code takes longer than writing it themselves.

When asked about a future where AI handles most coding tasks, 75% said they'd still want to ask a human for help - and the top reason was simply: when I don't trust AI's answers. Only 3% of developers said they "highly trust" AI output.

As Stack Overflow's own analysis of the trust gap observed, the typical technology adoption curve shows the opposite relationship - familiarity usually breeds confidence. But the more developers use AI, the less they trust it.

Frustration

66%

“Almost right, but not quite”

Debugging

45%

Takes longer than writing it

Human Trust

75%

Still ask a human

High Trust

“Highly trust” AI output

AI Adoption

% using or planning to use AI tools

Positive Sentiment

% with favorable stance on AI

Trust in Accuracy

% who trust AI-generated output

The Talent Pipeline Is Breaking

While organizations celebrate velocity metrics, we are systematically destroying the pipeline that produces competent engineers. A Stanford study analyzing ADP payroll data found employment for devs aged 22-25 declined ~20% from its late 2022 peak - precisely when ChatGPT launched. Erik Brynjolfsson called it the "fastest, broadest change" he had ever observed. Meanwhile, 54% of engineering leaders plan to hire fewer juniors, and CS graduates face 7.5% unemployment - higher than fine arts degree holders.

Here's why this matters beyond the immediate human cost: junior developers don't just fill seats. They are the R&D pipeline for your organization's future engineering leadership. They're the people who, over years of making mistakes, receiving feedback, and learning the "why" behind the code, develop the judgment needed to architect complex systems, diagnose subtle production failures, and make the kind of decisions that AI fundamentally cannot.

By cutting junior roles for short-term savings, companies are creating what can only be described as a generational talent gap. The mid-level and senior engineers needed in 2027 and 2028 - people with 3 to 5 years of hard-won debugging experience - simply won't exist in sufficient numbers. They were never hired.

~20%

Employment decline for devs aged 22-25

STANFORD / ADP

54%

Leaders plan to hire fewer juniors

LEADDEV 2025

-25%

Entry-level hiring decrease in 2024

YEAR-OVER-YEAR

7.5%

CS grad unemployment

STACK OVERFLOW

This pattern has played out before in another high-stakes industry. As cockpit automation advanced through the 1990s and 2000s, the aviation industry discovered that pilots who relied heavily on autopilot systems were losing the manual flying skills needed to handle emergencies. A 2013 FAA study found that 60% of safety incidents reviewed involved manual handling errors linked to automation overuse. The Royal Aeronautical Society warned that "many recent airline accidents have shown a common cause: the inability of pilots to cope with situations requiring manual control."

The industry response was mandated Upset Prevention and Recovery Training (UPRT) - forcing pilots to practice flying without automation. The FAA explicitly recommended that pilots "periodically use their manual skills for the majority of flights." Software engineering has no equivalent mandate. If we don't build one, we risk the same outcome: a workforce that functions well under normal conditions but cannot handle the inevitable moments when the automation fails and human judgment is required.

The Breakdown of the Product-Engineering Contract

AI is quietly destroying the implicit contract between product and engineering. For decades, the relationship worked like this: Product decides what to build. Engineering explains how hard it is. The negotiation between these two forces - between ambition and constraint - is what produced focused, well-scoped products. When a PM heard "that'll take three sprints," it forced them to think hard about whether the feature was worth three sprints. It forced prioritization. Now the PM hears "give me an hour," and the discipline evaporates.

If your team is producing 10x more code, you're also creating 10x more ownership and liability. Apiirodocumented a 10-fold increase in security findings per month at Fortune 50 enterprises. Research shows 15-25% of AI-generated code contains security vulnerabilities, with missing input sanitization the most common flaw.

10×

Increase in security findings/month at Fortune 50

APIIRO

15–25%

Of AI-generated code contains security vulnerabilities

RESEARCH

When “Efficiency” Gets Expensive

The CISQ estimated poor software quality costs the U.S. $2.41 trillion^[9], with $1.52T in accumulated technical debt - before the AI explosion. DX’s analysis^[13] puts enterprise AI tool deployments at ~$66,000+/year for 100 developers. For a 10-person team, direct tool licenses run roughly $66K, API costs add $36K, but one developer who tracked everything estimated $120K in hidden costs - making hidden costs more than double the visible spend.

As MIT Technology Review reported^[10]: a developer who spent six weeks testing AI tools found a median 21% slowdown. He then searched for macro evidence of AI-driven productivity gains. He found flat lines everywhere. Where’s the hockey stick on any of these graphs?

- MIT Technology Review, December 2025

Total Cost of Ownership: AI vs. Traditional Development

Traditional Development

Writing 40%

Review 25%

Debug 20%

Maint 15%

AI-Assisted Development

15%

Review 30%

Debug+Fix 30%

Maint 25%

Net cost change:+15-25% Total Cost of Ownership

What We Should Actually Be Doing

If the problem is clear, the solution requires genuine organizational courage - because it means pushing back against the dominant narrative that more AI equals more progress.

RESTORE the Idea Filter

INVEST in Code Quality Infrastructure

PROTECT the Junior Developer Pipeline

MEASURE What Matters

TREAT AI as a Power Tool

The Real Bottleneck Was Never Code

Code was never the bottleneck. Frameworks like Rails and Laravel already let competent developers ship MVPs in weeks. The hard part was understanding what users need, earning their trust, navigating organizational complexity, and building systems that work at 3 AM. None of those challenges are solved by generating code faster. The future is about building the infrastructure - the tools, metrics, practices, and organizational structures - that allow humans to maintain comprehension as systems grow. Because the bill always comes due. And the interest rate on technical debt is compounding.

vibe coded for 6 months. my codebase is a disaster.

“the app works. users are happy. revenue is coming in.( that's actually the only good part)

but i just tried to onboard a dev to help me and he opened the repo and went quiet for like 2 minutes. then said "what is this."

6 months of cursor and lovable and bolt. every feature worked when i shipped it. but nobody was thinking about structure. the AI just kept adding. new file here, duplicate function there, 3 different ways to handle the same thing across the codebase.

tried to refactor it myself last week. gave up after 2 hours. the thing is so tangled that touching one part breaks something completely unrelated.

the generation was fast. the cleanup is a nightmare.

is there even a way out of this or do i just rewrite everything from scratch?”

-REDDIT/vibecoding