What Happens to Developer Jobs When AI Writes 80% of the Code? A Deep Dive into Anthropic's Landmark Report

Anthropic reveals 80% AI-written code, sparking urgent questions about the future of software development jobs.
Anthropic's landmark report discloses that over 80% of its merged code is now written by Claude, with per-engineer output up 8x since 2024. This deep dive analyzes the implications: human review becoming a bottleneck, "taste" as a potentially fragile moat, why AI bubble fears are premature based on financial metrics, the rise of Loop Engineering, and the looming possibility of recursive self-improvement — all pointing to a transformation faster than most are prepared for.
Anthropic recently published a landmark essay titled When AI Begins to Build Itself, disclosing a wealth of internal data that reveals the profound impact AI is having on software development and knowledge work as a whole. As one of the world's highest-valued AI startups (with a valuation of $900 billion and Q1 revenue surpassing OpenAI's), Anthropic's report is not just an exercise in self-examination — it's a serious warning to the entire industry.
Anthropic was founded in 2021 by siblings Dario Amodei and Daniela Amodei, with Dario previously serving as VP of Research at OpenAI. The company's core philosophy is "AI safety first." Its founding team includes several key researchers who worked on GPT-2 and GPT-3 development but departed due to disagreements over OpenAI's safety direction. Anthropic introduced "Constitutional AI," a training methodology that has AI critique and revise its own outputs based on a set of predefined principles, reducing reliance on human annotation. The company has attracted massive investments from tech giants like Amazon and Google, with Amazon alone investing a cumulative $8 billion. It's precisely this technical pedigree and financial muscle that gives this report its extraordinary weight.
AI Is Already Writing the Vast Majority of Code
The most staggering data point in the report: as of May 2026, over 80% of merged code in Anthropic's codebase was written by their AI product, Claude. Before 2025, that figure was under 10%. Keep in mind that Anthropic employs some of Silicon Valley's top engineers — even their daily work has been deeply permeated by AI.
It's important to understand what "merged code" truly means. In modern software development, code merging is a rigorous process: developers write code on independent branches, and only after passing code review and automated testing can it be merged into the main branch. So 80% of merged code being AI-written doesn't just mean AI is producing large volumes of code — it means the quality of AI-written code has reached a standard that passes review by top-tier engineers. That's the truly staggering part.
Even more noteworthy is the magnitude of the efficiency leap — in Q2 2026, a typical engineer merged 8 times the amount of code per day compared to 2024. Because of this, Anthropic has stopped using lines of code as a metric for employee contribution, since the explosion in output comes purely from AI augmentation, not individual skill improvement.

This signals a fundamental shift: code writing is transitioning from a core human skill to a baseline AI capability, and the value anchor for engineers must be redefined.
Is Taste the Last Moat — or the Next Fortress to Fall?
Anthropic's report notes that humans currently retain a comparative advantage in "research taste and judgment." Borrowing Edison's famous quote — genius is 1% inspiration and 99% perspiration — that 99% perspiration is being increasingly automated.
Over the past year, "taste" has become a buzzword in AI coding and AI startup circles. In software development, taste refers to the ability to make upstream decisions: requirements review, solution evaluation, and technology selection. In content creation, creative ability and aesthetic judgment have become far more important than traditional post-production editing skills.

However, at the end of the report, Anthropic itself concedes: "What we call research taste in various domains may turn out to be just another AI capability." This sentence deserves careful reflection. When the next generation of AI products already demonstrates questioning ability and aesthetic sensibility surpassing 99% of humans, the "last fortress" we believed in may be far more fragile than we imagined.
This also raises a deeper question about education: if AI's aesthetic and creative capabilities continue to evolve, how exactly should we educate the next generation? The previous consensus was "knowledge doesn't matter — asking good questions and having good taste do." But if even these are surpassed by AI, then a solid foundation of fundamental knowledge may actually become critical again — because without a deep knowledge base, a person simply cannot develop profound insights into what makes a good question.
Human Review Becomes the New Bottleneck
The Anthropic report highlights a rather ironic scenario: when most human team members completely stop writing code and shift to review-only roles, if their code review speed can't keep up with Claude's code generation speed, human review actually becomes the new bottleneck in AI-driven development.
Some might wonder whether "reviewing AI's work" could become a new career path. But that road isn't as promising as it sounds. As early as April this year, Anthropic was already researching "whether a weaker model can reliably supervise a stronger model." This problem is known in AI safety as "Scalable Oversight" and is one of the core challenges in current alignment research. Logically speaking, if a human's capability level is 5 and a top AI's is 500, there must be a space in between where weaker AI models can serve a supervisory role.
Regardless, the people who can fully participate in AI review in the future will inevitably be a small minority with extremely deep domain knowledge. This is not a mass-market employment outlet.
Real-World Performance of Claude 3.5 and Mathos
In the social sciences, compared to Chinese domestic models, Claude 3.5 produces one or two additional deeply insightful observations per thousand words of output, with a sense of critical thinking and wisdom that goes beyond basic information synthesis (though domestic models still lead in prose quality and textual atmosphere).
In programming applications, Claude 3.5 is a "tier-breaking, dominant force" in architectural design, ultra-long code tasks, and complex debugging — a fairly consensus view within the industry.

One stunning example: Cheetah Mobile CEO Fu Sheng, on the day of the model's release, used simple natural language prompts to have AI build a complete Command & Conquer: Red Alert 2-style RTS game overnight — including terrain rendering, fog of war, AI opponent strategy, and even multiplayer support. While the graphics looked more like Red Alert 1 and the units and terrain were significantly simplified, the core gameplay loop of resource gathering, army building, and base rushing was fully playable. The AI even redesigned all buildings and units to avoid copyright issues. The significance of this case lies in the fact that RTS games are widely considered one of the most complex game genres, involving real-time pathfinding, resource management systems, AI decision trees, network synchronization, and other highly challenging technical modules that previously required professional teams months or even years to complete.
Meanwhile, Anthropic's most powerful model, Mathos, has yet to be released to consumers, and its true capabilities remain an open question.
What Stage Is the AI Bubble At?
The debate about an AI bubble has never stopped, but looking at multiple dimensions, we are far from the late stages of a bubble.
Model capabilities are still growing rapidly. The mainstream view expressed by multiple top engineers on podcasts is that the large model Scaling Law can conservatively maintain strong momentum for at least another year. Scaling Law was first systematically articulated by OpenAI in a 2020 paper, with the core finding that large language model performance follows a power-law relationship with model parameters, training data volume, and compute — as long as you keep increasing any of these three factors, model capability improves predictably. In recent years, researchers have also discovered "inference-time scaling" — investing more compute during the inference phase (e.g., letting the model "think longer") can significantly improve output quality. This is the core idea behind OpenAI's o1 series and Claude's thinking mode. From pre-training compute scaling to inference-phase breakthroughs, the S-curve of growth has not yet reached its plateau.
Financial metrics are far healthier than the dot-com bubble. Anthropic's annualized recurring revenue (ARR) was just $9 billion at the end of last year; by May of this year, it had reached $47 billion. ARR is the most critical financial metric for SaaS and subscription businesses, and this kind of 400%+ growth in roughly five months is virtually unprecedented in enterprise software history — for comparison, Salesforce took nearly 20 years to reach a similar revenue scale. While the five major U.S. cloud providers are ramping up AI compute capital expenditure, their overall debt ratio is only 40%, far below the 124% level during the 2000 dot-com bubble. The 2000 dot-com bubble was one of the most infamous speculative manias in tech history, when countless internet companies received sky-high valuations with no profits or even revenue, and the NASDAQ index plunged 78% from its peak. Forward P/E ratios for core AI stocks stand at 24x, well below the 54x at the peak of the dot-com bubble. Today, all of the top 10 U.S. companies by market cap exceed $1 trillion, none are traditional companies, and all have very strong profitability.

Penetration rates are still in the early stages. Anthropic's primary customers currently cover only three categories: Silicon Valley tech companies, top U.S. financial firms, and biopharmaceutical companies. Penetration across all industries has barely begun. There are 1.3 billion office white-collar workers globally, with 450 million enterprise accounts paying for the Office suite, yet most industries have not yet embedded AI into their workflows. This means AI's commercial ceiling is far from being reached, and current revenue growth is driven more by early adopters than by competition over a saturated market.
Loop Engineering: From Turn-by-Turn Commands to Autonomous Closed Loops
A hot new concept in Silicon Valley recently is "Loop Engineering." The concept is similar to "loop builds" in card games — early on, you need to calculate precisely and play cards manually, but once the loop system is assembled, it runs autonomously.
The ideal Loop in the future of AI: no more manually writing prompts, no more back-and-forth where you say something and AI responds, and no more human checkpoints after each phase. The ultimate goal humans are pursuing is to design an autonomously running closed-loop system, then let AI iteratively develop itself.
Anthropic's prediction is: given sufficient compute, an AI system will eventually emerge that can fully autonomously design and develop its own next generation — this is what's known as "Recursive Self-Improvement." This concept was first described by mathematician I.J. Good in his 1965 "intelligence explosion" hypothesis: if an AI system is intelligent enough to understand and improve its own design, the improved version will be even better at further improving itself, creating a positive feedback loop. This process could lead to exponential capability growth far beyond human comprehension and control. This is precisely why companies like Anthropic view AI safety research as a core mission — once recursive self-improvement kicks in, humans may have only an extremely narrow window to ensure the system's behavior aligns with human values. And it may arrive far faster than most organizations expect or are prepared for.
If models continue evolving at their current pace for another year and a half, AI may not be a "god," but it will be a "demigod"-level entity. And at this growth rate sustained for three to five years, the arrival of AGI may not be as far-fetched as it sounds.
Final Thoughts
This lengthy essay from Anthropic is less a technical report than a warning letter to the entire industry. When AI begins to build itself, when 80% of code is written by machines, and when "taste" may turn out to be just another AI capability, what we need to re-examine goes beyond career planning — it extends to educational philosophy, business models, and even our fundamental understanding of human value.
The only certainty is this: the pace of this transformation will, in all likelihood, outstrip our readiness.
Related articles

DeepSeek Forms Harness Team: AI Coding Competition Enters the Second Half
DeepSeek forms a dedicated Harness team to rival Claude Code. Analysis of the four-layer architecture, three core advantages, and 40x cost edge driving AI competition from model wars to engineering deployment.

DeepSeek V4 Pro In-Depth Review: Performance Rivaling GPT-5.5 at 1/12 the Cost
Comprehensive review of DeepSeek V4 Pro across coding, reasoning, and Agent benchmarks. Compare pricing vs GPT 5.5 and Claude Opus, plus hands-on coding demo with Pi Agent.

Vibe Coding in Practice: Building a Global Product from Scratch with an AI Workflow
A battle-tested AI development workflow covering Claude Code Plan Mode, documentation management, version control, and Cloudflare deployment for building global products.