AI Now Writes Over 80% of Code: What Doubling Capability Every 4 Months Really Means

Introduction: This Isn't the Future — It's the Present

Not long ago, Anthropic published a landmark article titled When AI Starts Building Itself. It wasn't speculating about the future — it was documenting something already happening: AI has begun writing code for AI and independently taking over tasks. And it's not playing a supporting role — it's genuinely taking the reins.

Anthropic was founded in 2021 by siblings Dario Amodei and Daniela Amodei, both former OpenAI executives. The company's core mission is "AI safety research." It introduced the Responsible Scaling Policy, committing to pause capability scaling when models reach specific danger thresholds until safety measures catch up. In the current three-way AI race between OpenAI, Google DeepMind, and Anthropic, Anthropic's differentiator is precisely this: it pushes the boundaries of capability while maintaining a clear-eyed awareness of the risks. This article is a concentrated expression of that stance.

The data points and real-world cases disclosed in the article are enough to make every practitioner reassess where they stand.

Capability Surge: The Terrifying Pace of Doubling Every Four Months

The most intuitive way to gauge AI capability is to look at how long a task it can independently complete. Here's the growth curve:

March 2024: Claude could handle roughly 4 minutes' worth of human work
One year later: That jumped to 1.5 hours
2025 to date: It can now tackle tasks at the 12-hour level

Notice the rhythm — the length of tasks it can handle roughly doubles every four months. This exponential growth pattern isn't unique in AI. It shares a similar underlying logic with the famous Moore's Law — which describes transistor counts on integrated circuits doubling approximately every two years — except AI's growth curve is even steeper. The forces driving this growth include: continuous optimization of model architectures (such as various Transformer variants), improvements in training data scale and quality, breakthroughs in inference-time compute techniques, and advances in alignment methods like Reinforcement Learning from Human Feedback (RLHF). Notably, what's being measured here isn't traditional benchmark scores but "duration of tasks that can be independently completed" — a metric much closer to real-world productivity, as it comprehensively reflects AI's planning ability, error recovery, and long-context understanding.

Extrapolating at this rate, by next year it could be taking over work that would take humans several weeks to finish.

Now look at Anthropic's internal data — it's even more striking: by May 2025, over 80% of the code in their codebase was written by Claude, up from single digits just over a year ago. The result? Each engineer now ships eight times more code per day than they did two years ago. Engineers have shifted from "writing code" to "reviewing code."

This data reflects an entirely new software development paradigm — "AI-native development." In this model, the engineer's role transforms from "code producer" to "code reviewer and architecture decision-maker." This is fundamentally different from traditional code generation tools (like early code templates or low-code platforms): Claude isn't filling in templates — it's writing logically complete code after understanding the requirements. This also raises an industry-wide question: when the majority of code is written by AI, code maintainability, security auditing, and technical debt management will face entirely new challenges — after all, reviewing AI-written code and reviewing human-written code may require completely different skills and mental models.

Over 80% of the codebase is written by AI

Not Just Fast — Genuinely Capable: Three Real-World Cases

Numbers alone might not convey the full picture. The article documents three real events that happened inside Anthropic.

Case 1: Bug Fixing — Four Person-Years of Work, Done in One Shot

In April of this year, Claude delivered over 800 fixes in one go, reducing the occurrence rate of a certain class of AI data API errors by a factor of one thousand. The engineer overseeing the work estimated that doing this manually would have taken at least four years.

Case 2: Firefighting — A Three-Day Investigation Done in Two Hours

During a production incident, an engineer simply gave Claude a problem description and server access — and left the rest to it. Claude methodically investigated each possibility, eventually identifying an extremely obscure debug flag, reproduced the issue, confirmed it, and provided a fix — all in two hours. This kind of work typically takes humans two to three days.

Case 3: Independent Research — AI Cracks 97% of an Open Research Problem

This is the most jaw-dropping one. Anthropic let a group of AI agents independently tackle an open-ended research problem. Two human researchers spent a week and only solved 23% of the problem. Claude's AI agents burned through 800 hours of compute time at a cost of $18,000 and cracked 97% — with every single experiment designed by the AI itself.

This case involves one of the most cutting-edge directions in AI today — AI Agents. Unlike traditional single-turn Q&A AI, AI Agents possess the ability to autonomously plan, invoke tools, interact with environments, and iterate based on feedback. They can decompose a complex research problem into multiple subtasks, independently design experimental protocols, execute experiments, analyze results, and adjust their next steps based on outcomes. The "800 hours of compute" also reveals an important trend: the cost structure of AI research is shifting from "labor-intensive" to "compute-intensive." This approach of using massively parallel AI agents to "brute-force search" the solution space is already showing power in fields like drug discovery and materials science — and Anthropic's case proves it works equally well in AI research itself.

AI agents independently completing research tasks

Judgment: Humanity's Last Moat Is Being Closed

You might think: no matter how powerful AI gets, don't humans still need to set the direction? Surely judgment is our moat.

The article describes a test that sent chills down my spine. They specifically identified moments where researchers took a wrong turn in their real work, captured those snapshots, and asked the AI: "If it were you, what would you do next?" Then they had another AI blindly evaluate whose advice was better.

This experimental design is known in AI evaluation as "counterfactual evaluation" — it doesn't test AI on idealized benchmarks but challenges it in real, uncertainty-laden decision-making scenarios. Using another AI for blind evaluation borrows from academia's peer review mechanism, aiming to eliminate potential human bias (such as humans tending to favor human judgment).

The results:

November 2024: AI won 51% of the time (essentially a tie)
April 2025: AI's win rate surged to 64%

The leap from 51% to 64% happened in just five months, meaning AI is rapidly improving at the "metacognitive" level — knowing when to change direction and when to stay the course. This capability was previously considered uniquely human, as it relies on intuition, experience, and perception of uncertainty.

Even the kind of "which way should we go" judgment that most resembles human intuition — AI is catching up, bit by bit. This means after execution, judgment is no longer a safe zone either.

Three Possible Futures: The Window Is Closing

The article outlines three possible trajectories:

Scenario One: Growth slows down. The curve gradually flattens, and existing capabilities diffuse across industries. The world changes, but humans get the most time to adapt.

Three possible future trajectories

Scenario Two: Sustained acceleration. AI dramatically boosts efficiency, but humans still hold the steering wheel. A 100-person company can produce the output of 10,000 or even 100,000 people. Organizational structures will be fundamentally reshaped.

Scenario Three: AI builds AI. AI begins designing and improving the next generation of AI on its own, with humans stepping aside to serve only as supervisors and reviewers. This is that slightly terrifying concept — recursive self-improvement.

Recursive Self-Improvement is one of the most central topics in AI safety, and the core mechanism behind the "Intelligence Explosion" hypothesis. The concept was first proposed by mathematician I.J. Good in 1965: if an ultra-intelligent machine could design machines smarter than itself, an "intelligence explosion" would occur, leaving human intelligence far behind. In the current context, recursive self-improvement means AI can not only write application code but also improve its own training processes, optimize its own model architecture, and even design better training data filtering strategies. This is fundamentally different from ordinary "AI-assisted programming" — the latter is AI helping humans write code; the former is AI helping AI become more powerful. Once this loop starts and its speed exceeds human oversight capacity, control could shift irreversibly in an extremely short time.

Anthropic CEO Dario Amodei himself has said that the latter two scenarios worry him most, because they move too fast — the window for society to prepare is too small.

What's Left for Humans?

There's a line in the article worth reading over and over:

Execution-level work like writing code and running experiments — the human labor cost has approached zero. What humans still have going for them is judgment — choosing which problems are worth tackling, and knowing when to cut losses on a dead-end path.

But what really hit me was one engineer's confession:

"On the good days, I keep feeling like nothing I do matters anymore — it's all been automated, and it's done faster and better than I could. But then there are those days when everything is falling apart… and that's the moment I realize I'm not even sure what I've actually been doing all along."

An engineer's confession

This confession reveals a deep anxiety: once execution is taken over, if you've never seriously thought about what you're actually doing and why, what you lose isn't just a job — it's the sense of meaning itself. This "meaning crisis" isn't unique to the AI era — artisans during the Industrial Revolution and assembly line workers during the automation wave experienced similar identity shocks. But this time is different: what's being replaced is no longer physical labor or repetitive cognitive work, but high-skill work once thought to require creativity and professional judgment. This forces us to rethink a fundamental question: when "doing things" is no longer an exclusively human capability, where exactly does human value anchor itself?

Final Thoughts: The Window Is Still Open — But It Won't Stay Open Forever

So the real question isn't "Will AI replace me?" but rather:

Are you seriously honing your judgment?
Do you truly understand what the work you're doing actually is?
Are you being pushed along by your tools, or using them to realize your own intentions?

At the end of the article, Anthropic expressed a hope: they want to give the world an option to hit the brakes, so that safety research can keep pace with technology's breakneck sprint. This embodies Anthropic's longstanding philosophy of "responsible scaling" — they pioneered the Constitutional AI method, which has AI constrain itself according to a set of explicit principles rather than relying entirely on human feedback for safety alignment. They want to bring policymakers, researchers, and ordinary people like you and me together to have a serious conversation about this.

Because the window is still open — but it won't stay open forever.

At a pace of doubling every four months, the time we have to think and prepare may be far less than we imagine.