> Research Report · April 2026
The Hidden Cost
of AI-Generated Code
Why Faster Code Writing Is Making Software Worse
Cost of poor
SW quality
CISQ estimate, U.S.
Delivery stability
per 25% AI adoption
DORA 2024
Of AI-generated code contains security vulnerabilities
Multiple studies
CS grad unemployment - higher than fine arts grads
Stack Overflow, 2025
What Are We Actually Talking About?
In February 2025, Andrej Karpathy dropped a tweet with a term that defined a massive shift already underway in software engineering: “vibe coding”[12]. The practice of describing what you want in plain English and letting an LLM generate the code while you “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” The concept of focusing on creative thinking while having a machine do all the actual work sounded fantastic. And not even two years later, it’s becoming clear that this wonder-tool may be creating the largest accumulation of technical debt in software history.
This isn’t an anti-AI post. AI tools are amazing for quick explorations or for rubber-ducking ideas at 2 AM when no human colleague would tolerate your half-baked questions. But there’s a big difference between using AI as a thinking partner and outsourcing your engineering judgment to a probabilistic text predictor. That difference - the gap between tool and crutch - is what this piece is about. Because the data is in. And it’s not telling the story the hype cycle promised.
How Did We Get Here?
The promise of vibe-coding was simple: you give a sharp prompt, AI writes code faster than any human can, features ship sooner, customers get happier, and companies spend less. The promise was so seductive that AI-generated code started spreading like wildfire. And though Karpathy initially used the term vibe-coding in the context of "throwaway weekend projects", vibe-coding started becoming a common tool for any sort of project by individual developers and massive corporations alike.
GitHub CEO Thomas Dohmke stated that Copilot writes an average of 46% of code in files where it's enabled. At Meta's LlamaCon in April 2025, Satya Nadella told Zuckerberg that 20-30% of code in Microsoft's repos is AI-written. During Google's earnings call in October 2024, Sundar Pichai first said more than 25% of new code at Google was AI-generated; by April 2025, that number had risen to over 30%. Y Combinator's Winter 2025 batch tells perhaps the most dramatic story: 25% of startups had codebases that were 95% AI-generated, hitting revenue milestones with teams so small they appeared to be rewriting the rules of startup economics.
The venture capital world embraced the revolution. The narrative was irresistible: smaller teams, faster iteration, lower burn rates, the democratization of software creation. Every founder who couldn't tell a for-loop from a foreach was suddenly a "technical founder." Every PM who'd been told "that's a two-sprint feature" was suddenly hearing "give me twenty minutes."
But the narrative had a blind spot the size of a galaxy: it optimized entirely for the creation of code and ignored everything that happens after you hit deploy. Software needs to be maintained, debugged, secured, documented, and understood by the humans who are responsible for it at 3 AM when it breaks.
AI-Generated Code at
Major Tech Companies
% of code written or generated by AI (2024-2025)
Cheaper code = vastly
more code, not less
JEVONS PARADOX
Lose 10+ hours/week
to non-coding overhead
ATLASSIAN 2025
Of a developer’s week
is spent coding
ATLASSIAN 2025
Report time savings
from AI tools
ATLASSIAN 2025
The Paradox of Productive Friction
Here's the thing everybody understands but nobody wants to talk about: the difficulty of writing software was a feature, not a bug. When implementation was expensive and slow, teams were forced to make hard decisions. They debated trade- offs. They killed mediocre ideas early because they couldn't afford to build all of them. The cost of engineering acted as a natural filter - a forcing function for product strategy that made leaders ask "is this actually worth building?" before they asked "can we build it?"
With AI-generated code being just one prompt away, that filter is gone now. When the cost of "trying it" drops to near zero, discipline collapses. I've watched teams ship half-baked features not because anyone believed in them, but because the AI made it so easy that saying "let's just see" felt costless. It wasn't. Every one of those experiments became a line item in the maintenance backlog.
As Atlassian's internal data shows, almost all developers (99%) now report time savings by using AI tools, with 68% saving more than 10 hours a week! Moreover, engineers spend only 16% of their week coding[6], and coding is not a friction point for developers. Developers are losing valuable time to non-coding tasks: 50% report losing 10+ hours per week, and 90% lose 6+ hours or more, largely due to organizational inefficiencies: coordination, debugging, testing, deployment, and documentation - areas where AI doesn't eliminate work, it multiplies the number of cycles teams have to run.
This is the Jevons Paradox playing out in real time. In 1865, William Stanley Jevons observed that more efficient steam engines didn't reduce coal consumption - they increased it, because cheaper energy unlocked entirely new categories of demand. The same dynamic is now consuming software. As analysis applying Jevons to AI coding explains: as code becomes cheaper to produce, we don't produce less of it. We produce vastly more. The surface area of software expands - more internal tools, more automations, more microservices, more "AI-powered" features bolted onto products that didn't need them - multiplying complexity at every layer.
Another developer seconds this message: "I built more in two months with agents than in the previous year. I used almost none of it... agents amplify clarity. If you know exactly what you want, they're a force multiplier. If you don't, they're an expensive way to generate garbage you'll never use."
The METR Study: You Think You’re Faster. You’re Not.
In July 2025, METR published a randomized controlled trial with 16 experienced developers working on real issues in their own repositories - large open-source projects averaging over one million lines of code and 22,000+ stars. The result was startling: developers using AI tools took 19% longer to complete tasks than those working without them.[1] But here's the truly unsettling part. Before the study, developers estimated AI would speed them up by 24%. After using AI and actually being slowed down, they still believed it had made them 20% faster. The gap between perception and reality was 39 percentage points.
Let that sink in. Experienced developers - people who've spent years building the instincts to evaluate their own performance - were confidently wrong about whether their tools were helping or hurting. It's interesting to note that developers in the study accepted <44% of AI generations and 9% of their time was spent reviewing and cleaning AI outputs.[1]
METR RCT · 16 developers · 246 tasks · repos averaging 1M+ LoC
The Perception–Reality Gap
GitClear: 211 Million Lines of Evidence
GitClear’s 2025 research analyzed 211 million changed lines[3] from Google, Microsoft, Meta, and enterprise repos. Refactoring plummeted from 25% in 2021 to under 10% in 2024. Copy/pasted code rose from 8.3% to 12.3%. 2024 was the first year cloned lines exceeded refactored lines. Duplicate blocks (5+ lines) increased eightfold[3]. Code churn - new code revised within two weeks - climbed from 3.1% to 5.7%, indicating premature or low-quality commits reaching production.
DORA Reports: The Stability Problem
Google's 2024 DORA report - based on responses from over 39,000 professionals[4] - confirmed the pattern. A 25% increase in AI adoption was associated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability. As one DORA analyst observed, the fact that individuals feel more productive with AI while the system has less throughput and stability feels quite contradictory.
The 2025 DORA report found that while AI's relationship with throughput had improved, the instability problem persisted. As the researchers wrote: "AI accelerates software development, but that acceleration can expose weaknesses downstream. Without robust control systems, an increase in change volume leads to instability." The RedMonk analysis of the 2025 report put it even more bluntly: "AI adoption not only fails to fix instability, it is currently associated with increasing instability.
Translation: we're shipping faster and breaking more things. And the breakage is not random - it's structural.[4][5]
Delivery stability per 25% AI adoption increase
2024 DORA
Throughput per 25% AI adoption increase
2024 DORA
Report individual productivity gains from AI
PRODUCTIVITY
Report little or no trust in AI-generated code
TRUST
Stack Overflow: Rising Usage, Falling Trust
The paradox of the moment is captured perfectly in Stack Overflow's 2025 Developer Survey of 49,000+ developers across 177 countries. Usage of AI tools rose to 84%, with 51% of professional developers relying on them daily. But trust plummeted: only 33% of developers trust AI-generated output, while 46% actively distrust it - a sharp reversal from 2023-2024, when positive sentiment exceeded 70%.
Two-thirds of developers reported struggling with AI solutions that are "almost right, but not quite" - close enough to look correct, far enough off to waste hours of debugging. And 45% said debugging AI-generated code takes longer than writing it themselves.
When asked about a future where AI handles most coding tasks, 75% said they'd still want to ask a human for help - and the top reason was simply: when I don't trust AI's answers. Only 3% of developers said they "highly trust" AI output.
As Stack Overflow's own analysis of the trust gap observed, the typical technology adoption curve shows the opposite relationship - familiarity usually breeds confidence. But the more developers use AI, the less they trust it.
“Almost right, but not quite”
Takes longer than writing it
Still ask a human
“Highly trust” AI output
AI Adoption
% using or planning to use AI tools
Positive Sentiment
% with favorable stance on AI
Trust in Accuracy
% who trust AI-generated output
The Talent Pipeline Is Breaking
While organizations celebrate velocity metrics, we are systematically destroying the pipeline that produces competent engineers. A Stanford study analyzing ADP payroll data found employment for devs aged 22-25 declined ~20% from its late 2022 peak - precisely when ChatGPT launched. Erik Brynjolfsson called it the "fastest, broadest change" he had ever observed. Meanwhile, 54% of engineering leaders plan to hire fewer juniors, and CS graduates face 7.5% unemployment - higher than fine arts degree holders.
Here's why this matters beyond the immediate human cost: junior developers don't just fill seats. They are the R&D pipeline for your organization's future engineering leadership. They're the people who, over years of making mistakes, receiving feedback, and learning the "why" behind the code, develop the judgment needed to architect complex systems, diagnose subtle production failures, and make the kind of decisions that AI fundamentally cannot.
By cutting junior roles for short-term savings, companies are creating what can only be described as a generational talent gap. The mid-level and senior engineers needed in 2027 and 2028 - people with 3 to 5 years of hard-won debugging experience - simply won't exist in sufficient numbers. They were never hired.
Employment decline for devs aged 22-25
STANFORD / ADP
Leaders plan to hire fewer juniors
LEADDEV 2025
Entry-level hiring decrease in 2024
YEAR-OVER-YEAR
CS grad unemployment
STACK OVERFLOW
The Breakdown of the Product-Engineering Contract
AI is quietly destroying the implicit contract between product and engineering. For decades, the relationship worked like this: Product decides what to build. Engineering explains how hard it is. The negotiation between these two forces - between ambition and constraint - is what produced focused, well-scoped products. When a PM heard "that'll take three sprints," it forced them to think hard about whether the feature was worth three sprints. It forced prioritization. Now the PM hears "give me an hour," and the discipline evaporates.
If your team is producing 10x more code, you're also creating 10x more ownership and liability. Apiirodocumented a 10-fold increase in security findings per month at Fortune 50 enterprises. Research shows 15-25% of AI-generated code contains security vulnerabilities, with missing input sanitization the most common flaw.
Increase in security findings/month at Fortune 50
APIIRO
Of AI-generated code contains security vulnerabilities
RESEARCH
When “Efficiency” Gets Expensive
The CISQ estimated poor software quality costs the U.S. $2.41 trillion[9], with $1.52T in accumulated technical debt - before the AI explosion. DX’s analysis[13] puts enterprise AI tool deployments at ~$66,000+/year for 100 developers. For a 10-person team, direct tool licenses run roughly $66K, API costs add $36K, but one developer who tracked everything estimated $120K in hidden costs - making hidden costs more than double the visible spend.
As MIT Technology Review reported[10]: a developer who spent six weeks testing AI tools found a median 21% slowdown. He then searched for macro evidence of AI-driven productivity gains. He found flat lines everywhere. Where’s the hockey stick on any of these graphs?
- MIT Technology Review, December 2025
Total Cost of Ownership: AI vs. Traditional Development
What We Should Actually Be Doing
If the problem is clear, the solution requires genuine organizational courage - because it means pushing back against the dominant narrative that more AI equals more progress.
The Real Bottleneck Was Never Code
Code was never the bottleneck. Frameworks like Rails and Laravel already let competent developers ship MVPs in weeks. The hard part was understanding what users need, earning their trust, navigating organizational complexity, and building systems that work at 3 AM. None of those challenges are solved by generating code faster. The future is about building the infrastructure - the tools, metrics, practices, and organizational structures - that allow humans to maintain comprehension as systems grow. Because the bill always comes due. And the interest rate on technical debt is compounding.
vibe coded for 6 months. my codebase is a disaster.
“the app works. users are happy. revenue is coming in.( that's actually the only good part)
but i just tried to onboard a dev to help me and he opened the repo and went quiet for like 2 minutes. then said "what is this."
6 months of cursor and lovable and bolt. every feature worked when i shipped it. but nobody was thinking about structure.
the AI just kept adding. new file here, duplicate function there, 3 different ways to handle the same thing across the codebase.
tried to refactor it myself last week. gave up after 2 hours. the thing is so tangled that touching one part breaks something completely unrelated.
the generation was fast. the cleanup is a nightmare.
is there even a way out of this or do i just rewrite everything from scratch?”