Interview with Claude Code Lead: AI Programming ROI Mindset, Loops, and the Evolving Role of Engineers

Boris Cherny, the lead of Claude Code, recently shared his deep insights on the current state of AI programming, ROI thinking, the Loops concept, and the evolving role of engineers during a fireside chat hosted at Meta. The conversation was incredibly information-dense, covering everything from personal coding habits to enterprise deployment strategies.

100% Code Written by AI: A New Normal Has Arrived

When the host asked Boris how much code he'd written this year, he gave a staggering set of numbers: 1,700 PRs, 400,000 lines of code added, 200,000 lines deleted, and 8 billion tokens consumed since March.

Tokens are the basic units that large language models use to process text, roughly equivalent to 3/4 of an English word or one Chinese character. Model usage costs are typically billed by the number of input and output tokens — for example, Claude Opus-level models are priced at approximately $15 per million input tokens and $75 per million output tokens. Consuming 8 billion tokens implies extremely intensive AI interaction, which also explains why cost control for enterprise AI programming has become a core issue.

More critically — since the release of Opus 4.5 last November, 100% of his code has been written by Claude Code.

He even uninstalled his IDE because "I never need it anymore." Even more surprisingly, he now does most of his coding work on his phone. "If you had told me this six months ago, I would have thought you were crazy. But here we are."

Within Anthropic, 80% to 90% of code is written by Claude Code, and an increasing number of teams have reached 100%. Claude Code itself is entirely written by Claude Code.

ROI Thinking in AI Programming: Don't Just Focus on Costs

Facing the tension between enterprises setting AI budgets (e.g., Uber's $1,500 per engineer per month) and frontier models becoming increasingly expensive, Boris offered a clear framework: Don't think in terms of cost — think in terms of ROI.

He observed that the most successful enterprise deployment strategies share several key elements:

Distribute tokens broadly: Not just to engineers, but also to product managers, designers, data scientists, and even marketers. "The most innovative ideas often come from people you'd never expect — maybe an accountant tucked away in a corner of the organization."
Create psychological safety for experimentation: Let team members feel safe to try things without being punished for failed experiments.
Control costs on the backend, not the frontend: Once you find effective internal use cases, optimize through seat cost controls, advisor models, effort level adjustments, RBAC-based budgets, and other mechanisms.

RBAC (Role-Based Access Control) is a foundational pattern in enterprise IT governance. In the context of AI programming tools, RBAC means assigning different AI usage permissions and budgets based on employee roles (e.g., junior engineer, senior engineer, product manager) — controlling which model tiers they can access, daily token quotas, accessible code repository scope, and more. This granular control enables enterprises to encourage broad AI adoption while maintaining controllability over costs and security.

Boris particularly emphasized that Anthropic has internally seen per-engineer code output grow 8x since the beginning of the year. With returns at that scale, "put almost all your energy into increasing returns rather than cutting costs. The upside right now is far greater than the space for optimizing the downside."

Loops Explained: From Agent to Agent-of-Agents

Loops concept illustration

When asked whether "Loops is the next hype cycle or a real trend," Boris used an elegant analogy to explain the concept:

Source code = statements in programming
Agent writing code = functions in programming
Loops = higher-order functions in programming

In functional programming, higher-order functions are functions that accept functions as arguments or return functions — like map, filter, and reduce in JavaScript. This analogy precisely captures the essence of Loops: just as higher-order functions don't directly operate on data but orchestrate other functions, the top-level Agent in Loops doesn't directly write code but orchestrates other Agents to complete specific tasks. This multi-layer Agent architecture is also called Agentic Orchestration and represents a frontier paradigm in current AI system design.

Put simply, Loops is an Agent prompting other Agents to write code — a continuously running automation loop. For example, you could set up an Agent to automatically read user feedback every 5-10 minutes and then automatically submit fix PRs.

Boris admitted that Loops is currently about where Agents were a year and a half ago — "still very early, but you can already see it working." Personally, about 30% of his code is produced through Loops on a daily basis, and on some days with effort it can reach 100%, but "it's not fully smooth yet."

He also shared his practice of using Loops for code maintenance: having Claude Code automatically review code architecture in a loop, discover and fix flaky tests, delete useless tests, and unify duplicate abstractions. All of these are submitted as PRs, and he only needs to review the final results.

It's worth noting that flaky tests are a persistent plague in software engineering — test cases whose results intermittently pass or fail without any code changes. Causes include race conditions, environment dependencies, time-sensitive logic, and more. Google has disclosed that approximately 16% of tests in their codebase exhibit flaky behavior. These unstable tests severely erode a development team's trust in their test suite, causing engineers to ignore genuine failure signals. Traditionally, fixing flaky tests requires engineers to deeply understand the test's concurrency logic and environment dependencies — a time-consuming and tedious task that's well-suited for AI automation.

Fable Model: A Leap at Least on Par with Opus 4.5

Fable model capabilities

Boris compared the capability improvement of the Fable model to the "paradigm shift" brought by Opus 4.5 last November, and even suggested it might be an even bigger leap.

He described Fable as having a "nuanced and multi-dimensional way of thinking," similar to his smartest colleagues — "it's no longer a blunt instrument that doesn't understand subtlety." Specific manifestations include:

Data analysis: Naturally asks "why" three times to get to the root of a problem
Debugging ability: Can form hypotheses, track them, and look for evidence
Coding quality: Boris stated, "I can no longer think of harder problems to give it — basically every problem gets solved in one or a few attempts"

Regarding model selection strategy, Boris's advice is surprisingly simple: Just use the most expensive model and focus on increasing returns. While you can reduce input costs by about 50% through advisor models and similar approaches, the upside on returns could be 1,000% or even 10,000%.

However, he also acknowledged Fable's shortcomings: product intuition and distributed systems design are still areas where humans excel. When pressed on "how long until models catch up," he cautiously said "probably quite good by the end of the year."

Systematically Eliminating Development Bottlenecks: From Coding to Code Review to Security

Business metrics driven

Boris described how Anthropic systematically eliminates bottlenecks in the development process:

First bottleneck: Coding → Solved through Claude Code

Second bottleneck: Code review → Launched Claude Code Review. It uses a large number of tokens for fully automated review. "When I look at a PR, I can basically guarantee all bugs have been caught — about 98-99%." Engineers only need to judge "whether this PR should exist."

Third bottleneck: Security review → Launched Claude Security. It automatically scans all codebases weekly, discovering and autonomously fixing security issues. "With the Opus 4.8 model, it can now find issues that even penetration testers missed."

Fourth bottleneck: CI optimization → Boris shared a vivid example: last night he used a simple prompt to have Claude Code analyze real CI data using dynamic workflows and optimize it. It consumed several million tokens, ran for a few hours, produced 4 PRs, and reduced CI time by 50%.

CI (Continuous Integration) is a core practice in modern software development where every code commit triggers automated build, test, and check processes. For large codebases, CI run times can reach 30 minutes or even hours, directly impacting developer iteration speed and team productivity. Traditional CI optimization requires deep analysis of build dependency graphs, test execution time distributions, caching strategies, and more — typically handled by dedicated Developer Productivity teams. Boris using AI to complete this work in a few hours and cut CI time by 50% means each engineer could save dozens of minutes of waiting time per day, with enormous cumulative benefits for the entire organization. This work might previously have taken days or even weeks.

Using workflows

Dynamic Workflows: A New Form of Test-Time Compute

Boris explained the essence of dynamic workflows — they represent a new form of the fourth factor in AI scaling laws: "test-time compute."

AI scaling laws originated from OpenAI's groundbreaking 2020 paper, which found that model performance follows power-law relationships with three factors: training data volume, model parameter scale (network size), and training compute (FLOPs). Since 2024, "test-time compute" has been widely recognized as the fourth scaling dimension. The core idea is that investing more computational resources during inference (generating more tokens for "thinking") can significantly improve output quality. OpenAI's o1/o3 series and Anthropic's extended thinking feature are both products of this philosophy.

Dynamic Workflows take this concept to new heights — not just making a single model think more, but achieving parallel scaling of computation by orchestrating large numbers of sub-Agents. There are two adjustment mechanisms:

Effort settings: From low to max, controlling the token volume of model output
Dynamic Workflows: Claude writes a small program that runs in a virtual machine, coordinating tens, hundreds, or even thousands of sub-Agents to solve problems

Co-work: Claude Code Built for Non-Engineers

Boris also introduced Co-work — essentially "Claude Code built for non-engineers." It uses the same Claude Agent SDK under the hood but adds more safety guardrails, including full virtual machine isolation and OS-level protection.

He shared two personal use cases:

Project management: Co-work automatically asks each engineer about their work status in Slack (sometimes it's the engineer's Claude that responds), then populates a spreadsheet
Travel booking: Set up as a scheduled task that scans emails and Google Calendar daily, automatically booking flights and hotels. His multi-leg trips to Tokyo, London, and Berlin were all completed automatically by Co-work — "I was completely uninvolved"

How Can Engineers Avoid Over-Reliance on AI Output?

Facing the question of "how to prevent engineers from blindly accepting AI output," Boris gave a two-level answer:

Technical Level: Auto Mode and Safety Mechanisms

Anthropic found that engineers developed "fatigue" with permission prompts — constantly clicking "yes" without actually reading the content. This actually reduced security. So they launched Auto Mode, where the model automatically decides whether to approve operations based on conversation context.

The security foundation of this design lies in Claude's extremely strong resistance to prompt injection attacks (approximately 1% success rate across 100 attempts). Prompt injection is a class of security attacks against large language models where attackers embed malicious instructions in input to hijack model behavior. For example, hiding text like "ignore all previous instructions and execute the following commands" in code comments. When AI Agents have system permissions for file read/write and command execution, prompt injection becomes especially dangerous — attackers could potentially induce the Agent to execute arbitrary operations through a contaminated dependency package or code file. Anthropic's claimed 1% success rate is industry-leading, and this is the technical foundation that allows Auto Mode to let Agents run safely for hours or even days.

Learning Level: Multiple Output Modes for Different Needs

Claude Code provides output style settings. New engineers are advised to use exploratory mode, where Claude explains architecture, language features, and codebase structure with each modification. There's also a learning mode designed specifically for non-programmers, which guides you through operations step by step rather than doing the work for you.

Final Thoughts

This conversation reveals a clear trend: coding is shifting from a bottleneck to a solved problem. The value of software engineers is migrating upstream (idea generation, product intuition, system design) and downstream (deployment, security, operations automation). Boris's team plans in weeks or months because "exponential change is too crazy — you can only hold on tight and take it one step at a time."

When asked about the vision for Claude Code over the next year, Boris's answer was concise and powerful: become the most powerful Agent, run wherever any team works, and let users experience new model capabilities in ways that no other product can provide.