> Research Report · April 2026
The Hidden Cost
of AI-Generated Code
Why Faster Code Writing Is Making Software Worse
Cost of poor
SW quality
CISQ estimate, U.S.
Delivery stability
per 25% AI adoption
DORA 2024
Of AI-generated code contains security vulnerabilities
Multiple studies
CS grad unemployment — higher than fine arts grads
Stack Overflow, 2025
What Are We Actually Talking About?
In February 2025, Andrej Karpathy dropped a tweet with a term that defined a massive shift already underway in software engineering: “vibe coding”. The practice of describing what you want in plain English and letting an LLM generate the code while you “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” The concept of focusing on creative thinking while having a machine do all the actual work sounded fantastic. And not even two years later, it’s becoming clear that this wonder-tool may be creating the largest accumulation of technical debt in software history.
This isn’t an anti-AI post. AI tools are amazing for quick explorations or for rubber-ducking ideas at 2 AM when no human colleague would tolerate your half-baked questions. But there’s a big difference between using AI as a thinking partner and outsourcing your engineering judgment to a probabilistic text predictor. That difference — the gap between tool and crutch — is what this piece is about. Because the data is in. And it’s not telling the story the hype cycle promised.
How Did We Get Here?
The promise of vibe-coding was simple: sharp prompt in, features shipped out, everyone happy. It was so seductive that AI-generated code started spreading like wildfire. GitHub CEO Thomas Dohmke stated that Copilot writes an average of 46% of code in files where it’s enabled. At Meta’s LlamaCon in April 2025, Satya Nadella told Zuckerberg that 20–30% of code in Microsoft’s repos is AI-written. During Google’s earnings call in October 2024, Sundar Pichai first said more than 25% of new code at Google was AI-generated; by April 2025, that number had risen to over 30%. Y Combinator’s Winter 2025 batch tells perhaps the most dramatic story: 25% of startups had codebases that were 95% AI-generated, hitting revenue milestones with teams so small they appeared to be rewriting the rules of startup economics.
The venture capital world embraced the revolution. The narrative was irresistible: smaller teams, faster iteration, lower burn rates, the democratization of software creation. Every founder who couldn’t tell a for-loop from a foreach was suddenly a “technical founder.” Every PM who’d been told “that’s a two-sprint feature” was suddenly hearing “give me twenty minutes.”
AI-Generated Code at
Major Tech Companies
% of code written or generated by AI (2024–2025)
Cheaper code = vastly
more code, not less
JEVONS PARADOX
Lose 10+ hours/week
to non-coding overhead
ATLASSIAN 2025
Of a developer’s week
is spent coding
ATLASSIAN 2025
Report time savings
from AI tools
ATLASSIAN 2025
The Paradox of Productive Friction
Here’s the thing everybody understands but nobody wants to talk about: the difficulty of writing software was a feature, not a bug. When implementation was expensive and slow, teams were forced to make hard decisions. They debated trade-offs. They killed mediocre ideas early because they couldn’t afford to build all of them. The cost of engineering acted as a natural filter — a forcing function for product strategy that made leaders ask “is this actually worth building?” before they asked “can we build it?”
With AI-generated code just one prompt away, that filter is gone. This is the Jevons Paradox playing out in real time. In 1865, William Stanley Jevons observed that more efficient steam engines didn’t reduce coal consumption — they increased it. The same dynamic is now consuming software: more internal tools, more automations, more microservices, more “AI-powered” features bolted onto products that didn’t need them. Meanwhile, as Atlassian’s data shows, engineers spend only 16% of their week coding — and coding is not the friction point.
The METR Study: You Think You’re Faster. You’re Not.
In July 2025, METR published a randomized controlled trial with 16 experienced developers working on real issues in their own repositories. Developers using AI tools took 19% longer to complete tasks. Before the study, they estimated AI would speed them up by 24%. After being slowed down, they still believed it had made them 20% faster. The gap between perception and reality was 39 percentage points. Developers accepted fewer than 44% of AI generations, and 9% of their total time was spent reviewing and cleaning AI outputs.
METR RCT · 16 developers · 246 tasks · repos averaging 1M+ LoC
The Perception–Reality Gap
Code Quality Metrics Over Time (2020–2024)
Copy/Pasted code & churn rising sharply post-Copilot launch
GitClear: 211 Million Lines of Evidence
GitClear’s 2025 research analyzed 211 million changed lines from Google, Microsoft, Meta, and enterprise repos. Refactoring plummeted from 25% in 2021 to under 10% in 2024. Copy/pasted code rose from 8.3% to 12.3%. 2024 was the first year cloned lines exceeded refactored lines. Duplicate blocks (5+ lines) increased eightfold. Code churn — new code revised within two weeks — climbed from 3.1% to 5.7%, indicating premature or low-quality commits reaching production.
DORA Reports: The Stability Problem
Google’s 2024 DORA report — based on over 39,000 professionals — revealed that while 75.9% relied on AI daily and 75% reported productivity gains, system-level metrics told the opposite story. A 25% increase in AI adoption correlated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability. At the same time, 39% of respondents reported little or no trust in AI-generated code. The 2025 RedMonk analysis concluded: “AI adoption not only fails to fix instability, it is currently associated with increasing instability.”
Delivery stability per 25% AI adoption increase
2024 DORA
Throughput per 25% AI adoption increase
2024 DORA
Report individual productivity gains from AI
PRODUCTIVITY
Report little or no trust in AI-generated code
TRUST
Stack Overflow: Rising Usage, Falling Trust
Stack Overflow’s 2025 Survey of 49,000+ developers reveals a striking paradox. Usage rose to 84%, with 51% relying on AI daily. But trust plummeted: only 33% trust AI output, while 46% actively distrust it. Positive sentiment dropped from 72% to 60% in a single year. Only 3% said they “highly trust” AI output. Two-thirds struggle with AI solutions that are “almost right, but not quite,” and 45% say debugging AI code takes longer than writing it themselves. When asked about a future where AI handles most coding, 75% said they’d still want to ask a human.
“Almost right, but not quite”
Takes longer than writing it
Still ask a human
“Highly trust” AI output
AI Adoption
% using or planning to use AI tools
Positive Sentiment
% with favorable stance on AI
Trust in Accuracy
% who trust AI-generated output
The Talent Pipeline Is Breaking
While organizations celebrate velocity metrics, we are systematically destroying the pipeline that produces competent engineers. A Stanford study analyzing ADP payroll data found employment for devs aged 22–25 declined ~20% from its late 2022 peak — precisely when ChatGPT launched. Erik Brynjolfsson called it the “fastest, broadest change” he had ever observed. Meanwhile, 54% of engineering leaders plan to hire fewer juniors, and CS graduates face 7.5% unemployment — higher than fine arts degree holders.
Employment decline for devs aged 22–25
STANFORD / ADP
Leaders plan to hire fewer juniors
LEADDEV 2025
Entry-level hiring decrease in 2024
YEAR-OVER-YEAR
CS grad unemployment
STACK OVERFLOW
Increase in security findings/month at Fortune 50
APIIRO
Of AI-generated code contains security vulnerabilities
RESEARCH
The Breakdown of the Product–Engineering Contract
AI is quietly destroying the implicit contract between product and engineering. For decades: Product decides what to build, Engineering explains how hard it is. The negotiation between ambition and constraint produced focused products. Now the PM hears “give me an hour” and discipline evaporates. 10x more code means 10x more liability. Apiiro documented a 10-fold increase in security findings per month at Fortune 50 enterprises. Research shows 15–25% of AI-generated code contains security vulnerabilities, with missing input sanitization the most common flaw.
When “Efficiency” Gets Expensive
The CISQ estimated poor software quality costs the U.S. $2.41 trillion, with $1.52T in accumulated technical debt — before the AI explosion. DX’s analysis puts enterprise AI tool deployments at ~$66,000+/year for 100 developers. For a 10-person team, direct tool licenses run roughly $66K, API costs add $36K, but one developer who tracked everything estimated $120K in hidden costs — making hidden costs more than double the visible spend.
As MIT Technology Review reported: a developer who spent six weeks testing AI tools found a median 21% slowdown. He then searched for macro evidence of AI-driven productivity gains. He found flat lines everywhere. Where’s the hockey stick on any of these graphs?
— MIT Technology Review, December 2025
Total Cost of Ownership: AI vs. Traditional Development
What We Should Actually Be Doing
If the problem is clear, the solution requires genuine organizational courage — because it means pushing back against the dominant narrative that more AI equals more progress.
Product teams need more rigorous discovery, not less. The question: “should we build this, and can we maintain it?”
Measure architectural coherence, dependency health, duplication, security posture, and debt accumulation in real time.
Pair juniors with AI tools and senior mentors. The developers who thrive in 2027 will know when the AI is wrong, why, and how to fix it.
The Real Bottleneck Was Never Code
Code was never the bottleneck. Frameworks like Rails and Laravel already let competent developers ship MVPs in weeks. The hard part was understanding what users need, earning their trust, navigating organizational complexity, and building systems that work at 3 AM. None of those challenges are solved by generating code faster. The future is about building the infrastructure — the tools, metrics, practices, and organizational structures — that allow humans to maintain comprehension as systems grow. Because the bill always comes due. And the interest rate on technical debt is compounding.
“I just want to say that I am giving up on creating anything anymore. I was trying to create my little project, but every time there are more and more errors… I am working on it for about 3 months, I do not have any experience with coding and was doing everything through AI. But every time I want to change a little thing, I kill 4 days debugging other things that go south.”
— Frustrated vibe coder, quoted by Gary Marcus
Sources
This report draws on peer-reviewed studies, randomized controlled trials, and large-scale surveys — not vendor case studies.
Oleksandr Shulika
Director of Strategic Projects | CEO Office | ex-McKinseyAlex has 15+ years of experience in strategy, product and business development. During his time at McKinsey & Co, as well as in both tech and industrial companies (e.g., UMG, AEG Power Solutions), he served as liaison between the software development teams and business units, focusing on translating business requirements into clear product development roadmaps and feature requests.