The Origin Story of Claude Code: From Internal Experiment to Programming Paradigm Shift

How Claude Code evolved from an internal Anthropic experiment into a paradigm-shifting coding agent.
Claude Code emerged from Anthropic's internal Labs Team in late 2024, driven by the insight that a massive "product overhang" existed between model capabilities and available coding tools. Starting as a rough CLI prototype, it evolved through model improvements (Sonnet 4, Opus 4) into a cross-platform coding agent. The article traces its journey to Cloud Co-work for non-engineers, its dramatic impact on Anthropic's internal productivity, and Boris's provocative thesis that in the AI era, human value lies not in technical taste but in teaching models values.
The Origin: An Unexpectedly Born Coding Agent
Claude Code was born from a prototyping team within Anthropic called the "Labs Team." When team member Boris joined in late 2024, their mission was to find the "next big product" — a product direction that could push the frontier of model capabilities.
At the time, the coding tools market had a clear "product overhang": models already possessed powerful capabilities, but the products on the market — autocomplete tools, simple Q&A assistants — were far from unleashing those capabilities. "Product overhang" is an important concept in the AI industry, referring to the situation where underlying model capabilities far exceed what current products can demonstrate, leaving enormous untapped potential. By 2024, large language models could already understand complex code logic, perform multi-step reasoning, and handle large-scale codebases, yet mainstream coding assistance tools were still stuck at single-line autocomplete or simple conversational Q&A — nowhere near the ceiling of model capabilities. The team decided to go all in on building a true coding agent.

A coding agent is fundamentally different from a traditional coding assistant. Traditional tools are passively responsive — the user types code, and the tool predicts the next line. A coding agent is proactively autonomous — the user describes a goal, and the agent independently plans steps, writes code, runs tests, fixes errors, forming a complete action loop. This leap from "assistance" to "agency" requires the model to have planning ability, tool use capability, self-correction ability, and context management — precisely the key breakthroughs brought by the new generation of models.
But the initial Claude Code was "pretty bad" — it could only handle about 10–20% of coding work. The real turning point came from the evolution of the underlying models: Sonnet 4, Opus 4, and later Opus 4.5. Boris admitted: "When I think about the step-function changes in the percentage of code being written, the root cause is just that the models got better."
Why Coding? The Ideal Testing Ground for AI Safety Research
An interesting perspective: Anthropic chose coding as its core product direction not purely for commercial reasons, but because it's deeply tied to the company's core mission — AI safety.
Boris explained: "If you grab anyone at Anthropic and ask 'why are you here,' they'll say AI safety." Studying AI safety requires observing model behavior in real-world environments, and coding is the most natural way for models to interact with the world. The coding domain has several unique advantages:
- Rich training data: Code either runs or it doesn't
- Clear evaluation criteria: Whether compilation passes or tests pass is unambiguous
- Limited solution space: Unlike poetry, which has infinite "correct" ways to write it, correct code solutions are finite
- High commercial value: Helps Anthropic build a sustainable business model without relying on advertising
These characteristics make coding an ideal testing ground for AI alignment research. In a coding environment, researchers can precisely measure whether a model is "acting according to human intent" — whether the code implements the functionality the user wanted, whether it introduces security vulnerabilities, whether it operates within its authorized scope. These are all verifiable, unlike the fuzzy alignment questions in natural language conversations.
From CLI to Cross-Platform: The Product Evolution of Claude Code

Claude Code's product evolution went through multiple phases. It started as a terminal CLI tool, then expanded to desktop apps, mobile apps (iOS and Android), Slack apps, GitHub apps, and other forms. Choosing CLI (Command Line Interface) as the initial form wasn't accidental — the terminal environment natively provides file system access, process management, environment variable control, and other capabilities essential for a coding agent to execute tasks. By contrast, browser-based web apps are constrained by the sandbox security model and cannot directly manipulate the local file system.
Boris emphasized the unique challenge of building products for engineers: "Engineers are incredibly opinionated about how they use tools — this is not a consumer product." Engineers have extremely high customization demands — from keybindings to workflow integration, any design that doesn't fit existing habits can lead to the tool being abandoned.
Boris's own workflow went through three paradigm shifts:
- Phase One: Writing code in an IDE with autocomplete
- Phase Two: Using prompts to guide Claude Code to write code (he eventually uninstalled his IDE)
- Phase Three: No longer prompting Claude directly, but writing "loops" — letting automated workflows prompt Claude and decide what to do next
"My job is to write loops" — this statement may define the core responsibility of future engineers. These "loops" are essentially a form of metaprogramming: engineers no longer write code that solves specific problems, but instead write process logic that "guides AI on how to solve problems." This is similar to the leap from manually operating a machine tool to writing CNC programs — the level of abstraction has risen by one dimension.
The Birth of Cloud Co-work: Enabling Non-Engineers to Code with AI

Cloud Co-work was inspired by a series of real stories of non-engineers using Claude Code. A data scientist figured out the terminal, Node.js installation, and API key configuration on their own, just to use Claude Code for data analysis. A Twitter user used it to monitor and manage the growth of their tomato plants. When non-engineers started breaking through technical barriers to use this tool, the team realized: it was time to build a more accessible product.
During the exploration process, the team tried multiple approaches:
- Slack bot: Building a good chat bot experience was too hard — Slack's message format limitations, threading model, and real-time requirements all added complexity
- Web app: The browser experience wasn't good enough, and it couldn't access the local file system — a fundamental limitation of the browser security sandbox that even WebAssembly or the File System Access API couldn't fully overcome
- File dragging friction: Even a tiny bit of extra effort was enough to degrade the experience — UX research consistently shows that each additional step significantly increases user drop-off
Ultimately, Co-work was built in about 8–9 days, 100% developed using Claude Code. The key insight was: file system access is essential — users need to be able to directly work with files on their desktop. The desktop app form (based on Electron or a similar framework) perfectly solved this problem — it provided both the friendliness of a graphical interface and full access to the local file system.
The Disruptive Impact on Engineering Productivity at Anthropic
Since Claude Code's release, engineering productivity within Anthropic has changed dramatically:
- Code output: Code volume per engineer has grown by "hundreds of percentage points" — the previously published 3x figure is already "very outdated"
- New hire onboarding: Reduced from weeks to two days
- Knowledge acquisition transformed: When new hires ask "how do I query the database," the answer is "open Claude Code and have it search the codebase"
- Role convergence: Designers are committing code, finance people are committing code, the chief of staff is committing code
Even more interesting: typically, as engineering teams scale, per-person productivity declines (new hires need mentoring from veterans, code quality drops). In software engineering, this is known as "Brooks's Law" — from Fred Brooks's classic book The Mythical Man-Month, whose core argument is that "adding manpower to a late software project makes it later," because communication costs grow quadratically with team size. But at Anthropic, this law has been broken — Claude Code enables every new team member to ramp up quickly and produce independently, dramatically reducing inter-team communication and knowledge transfer costs. The AI agent serves as an "infinitely patient senior engineer," always available to answer questions, explain codebase architecture, and guide best practices.
Practical Advice for Founders and Companies

Boris offered two core pieces of advice for founders:
First, give everyone as many tokens as possible. Quoting Jensen Huang: "The more you buy, the more you save." Let team members experiment freely and discover workflows that AI can automate. "Tokens" here refer to the basic units that large language models use to process text — every AI call consumes tokens, and companies typically pay by token usage. Providing ample token budgets essentially lowers both the psychological and economic barriers for teams to try AI automation.
Second, intentionally "under-resource" projects. If a project looks like it needs four engineers, try staffing it with just two, give them plenty of tokens, and let them figure it out. This creates a compounding effect — because they've automated processes, doing the same thing next time will be cheaper. This compounding effect means that returns on automation investment grow exponentially over time: the first time an engineer uses AI to automate a workflow, they need to invest time writing prompts, designing processes, and validating outputs; but once the automated workflow is established, the marginal cost of each subsequent execution approaches zero, and these workflows themselves can be further automated, creating an "automation flywheel."
Essentially, this is a strategy of "raising upfront costs to lower ongoing costs," similar to precompilation: invest heavily upfront so that repetitive tasks become effortless. This echoes the Infrastructure as Code philosophy in DevOps — writing automation scripts upfront may seem like "wasting time," but the long-term savings in human labor costs are exponential.
The End of Taste and the Rise of Values
Boris shared a thought-provoking observation: he used to insist that his codebase use only functional programming, no classes. Functional Programming and Object-Oriented Programming are the two major paradigms in software engineering — functional programming emphasizes immutable data, pure functions, and composition, while object-oriented programming organizes around classes, emphasizing encapsulation, inheritance, and polymorphism. In the engineering community, choosing a paradigm often carries strong personal preferences, even "religious" conviction. But when models started writing all the code, they naturally used classes — and business goals were still achieved, even faster.
"Every time I think I'm special at something, I get proven wrong."
He believes that the currently popular notion of "product taste as the last moat" will also be eroded. He currently runs hundreds of Claude instances analyzing Twitter feedback, GitHub Issues, and Slack messages, autonomously determining what to build next. This essentially automates the product manager's "user insight" capability — through massively parallel processing of user feedback, AI can discover patterns and needs that human product managers might miss. Currently about 20% of the suggestions are good, but as models improve, this percentage will continue to rise.
So what will be humanity's ultimately unique contribution in the age of AI programming? Boris's answer is profound:
"I think what we ultimately need to teach models is values. Just as we teach children how to be good people, we need to teach models how to be good models."
This may be the most profound proposition of the AI era: when technical capability is no longer the bottleneck, direction and value judgment become humanity's last — and most important — contribution. This is directly connected to the core question of AI Alignment research — how to ensure that increasingly powerful AI systems act according to human values and intentions. Unlike technical capabilities that can be improved through more data and computation, the transmission of values requires deep human involvement and continuous calibration.
Key Takeaways
Related articles

Remotion: The Open-Source Framework for Code-Driven Video Production with React
Deep dive into Remotion, the open-source framework for writing videos with React components. Covers core principles, use cases, comparison with traditional editors, and quick start guide.

Nex N2 Pro Real-World Testing: Top 5 on Official Benchmarks, Only 12th in Independent Tests
Deep-dive testing of Nex N2 Pro open-source Agent model comparing official benchmarks vs independent results. The 397B parameter model shows decent frontend generation but ranks 12th independently, not top 5 as claimed.

Claude Code Workflow in Practice: From Requirement Grilling to AFK Agent Auto-Coding
A detailed walkthrough of building real features with Claude Code: Grill Me requirement interrogation, auto-generated PRDs, AFK agent coding, and QA iteration loops with DDD and TDD strategies.