Anthropic London Developer Conference: Claude Model Upgrades, Enterprise Agent Platform, and Developer Tools Fully Evolved

Anthropic brought its "Code with Claude" developer conference outside San Francisco for the first time, landing in London to showcase a comprehensive upgrade of the Claude ecosystem to developers worldwide. From exponential leaps in model capabilities, to the maturation of an enterprise-grade Agent platform, to the deep evolution of Claude Code developer tools, this event painted a vivid picture of the future of AI-assisted development.

Cloud Platform如何让这一切成为现实

相应地调整他们

你不需要到处走动

AI Is Reshaping the Distance from Idea to Product

Anthropic's head of product, Boris, opened with a heartfelt story — at age 13, he wrote programs on a TI-83 calculator to pass math exams and used HTML to spruce up eBay pages for selling Pokémon cards. That magical feeling of "writing code and watching it run" is the most primal charm of programming.

However, as the industry evolved, compilers, type checkers, build systems, and package managers piled on layer after layer, pushing the distance from "idea to running code" further and further apart. Now, AI is dramatically shrinking that distance — "You describe a problem, and the program appears. It's that calculator feeling, except this calculator can also write distributed systems."

This isn't just talk. Spotify used Claude Code to build backend Agents that merge over 1,000 PRs into production every month, reducing migration time by more than 90%. Social services software company Binti used the Claude API to shorten the foster family licensing process by 20 days — this isn't just an efficiency metric; it means children can connect with families faster.

Exponential Leaps in Claude Model Capabilities: From Minutes to Continuous Operation

Lisa from the research product management team reviewed the evolution of the Claude model family: Opus 3 was the first model truly adept at writing long code; Sonnet 3.5 achieved the ability to safely use computers; Sonnet 3.7 introduced the "think before answering" paradigm; and Opus 4 unexpectedly demonstrated the ability to generate complex Excel and PowerPoint documents.

The latest Opus 4.7 and Mythos Preview mark a qualitative shift — Claude can now take end-to-end responsibility for outcomes and exercise judgment to complete highly ambiguous tasks. Over the past 12 months, Anthropic released 8 frontier models, each building on the one before.

Task Span: The Core Metric for Measuring AI Model Evolution

Lisa introduced a remarkably insightful dimension for measurement — "Task Span," meaning how long a model can sustain work without losing the thread. A year ago, models could only reliably work for a few minutes at a stretch; today, Agents can run continuously for hours; future versions of Claude will be able to run persistently, becoming proactive Agents that are "always on and know what to do without being told."

This implies a fundamental shift in usage paradigms: no longer "have Claude write a project update," but "have Claude keep the project on track this week"; no longer "have Claude generate a financial forecast," but "have Claude own and continuously update this forecast."

A Stunning Case: Mythos Discovered a 27-Year-Old Vulnerability in OpenBSD

One jaw-dropping case disclosed at the conference was that the Mythos model read through the entire OpenBSD source code and discovered a vulnerability that had existed for 27 years — one that had survived all human reviewers, fuzz testers, and static analyzers for nearly three decades. This not only demonstrates the model's depth of code comprehension but also foreshadows AI's enormous potential in security auditing.

Claude Platform: Helping Enterprises Bridge the AI Linear Adoption Gap

Despite exponential growth in model capabilities, most enterprises are still adopting AI in a linear fashion. Angela and Katelyn pointed out two core issues holding enterprises back: getting correct results is too hard (requiring extensive prompt optimization, tool building, and other work), and the need to balance speed with scalability.

Advisor Pattern: Frontier Intelligence at One-Fifth the Cost

An elegant solution is the "Advisor Pattern" — using a smaller Sonnet-level model as the executor and Opus as the advisor. When the smaller model encounters difficulty, it can seek guidance from the larger model. In practice, this combination not only makes Sonnet perform far better than when used alone, but is actually cheaper, because Opus's advice helps Sonnet complete work more efficiently. Customer IfLego reported that this strategy achieved frontier-model quality at just one-fifth the original cost.

Cloud Managed Agents: Accelerating Agent Delivery from Months to Days

Cloud Managed Agents is an orchestration framework for Agents, equipped with production-grade infrastructure, enabling teams to build production-ready Agents in days rather than months. Asana built AI Teammates on top of it, enabling direct collaboration between humans and Agents within projects.

Two key new features were announced at the conference: Self-hosted Sandboxes (allowing Claude to execute work on enterprise-owned servers) and MCP Tunnels (securely accessing internal MCP servers behind firewalls). In a live demo, a growth Agent for a fictional company called Counter securely accessed an internal data warehouse via MCP Tunnel, proactively analyzed A/B test results, automatically deployed the winning experiment variant, and wrote cleanup code in a self-hosted sandbox — all without human intervention.

Claude Code Evolution: From Synchronous Coding to Asynchronous Automation

Covering Terminal, IDE, Desktop, and Mobile

Claude Code has evolved from its original CLI tool into a comprehensive development platform. The new desktop version offers a full-screen graphical interface with built-in preview functionality and a sidebar control panel; the Agent View in the CLI lets terminal users manage multiple parallel tasks at a glance. Mobile support (iOS and Android) lets developers kick off tasks anytime, anywhere — "You're no longer chained to your desk. You can go to the park, touch some grass, and still get the job done."

Routines: Letting Claude Prompt Itself

This may be the most transformative feature announced at the conference. Routines are a "higher-order prompt" — developers configure them once, and Claude Code can then run on a schedule, respond to webhooks, or handle API requests. In his demo, Boris showcased a Routine that monitors GitHub Issues: after a teammate submits an issue, the Routine asynchronously detects it and spins up Claude to handle it, so the developer wakes up to a PR ready to merge.

Another powerful application is automated CI repair — a Routine continuously monitors PR status and automatically fixes CI failures, code review comments, and merge conflicts. In the demo, CI experienced a flaky failure due to a network timeout, and the Routine automatically diagnosed it as an underlying infrastructure issue and retried — the engineer responsible for the PR wouldn't even see that red mark.

As Boris put it: "The default is no longer 'I'm going to go prompt Claude,' but 'I'll have Claude ask Claude.'"

Enterprise Validation: Large-Scale Adoption at Shopify and MercadoLibre

Shopify is using Claude Code company-wide, spanning engineers, product managers, designers, and data scientists. MercadoLibre's entire team of 23,000 engineers uses Claude Code; under human oversight, they have reviewed over 500,000 PRs, modernized more than 9,000 applications, and aim to achieve 90% autonomous coding by Q3.

A touching detail: managers and VPs who hadn't committed code in years are now shipping code again. Claude Code is putting coding back into the hands of people who spent the last decade in review meetings and roadmap sessions.

Actionable Advice for Developers

Lisa offered several key recommendations in her talk:

Design for the next version of Claude, not the current one — the developers who ultimately win are those whose architectures are ready to absorb the next massive leap
Reduce scaffolding — as models get smarter, scaffolding that was once helpful may now constrain Claude
Continuously create harder evaluations — when a task that used to always fail starts passing, that's your signal to ship a new feature
Treat model upgrades as business opportunities — make upgrades easier by automating evaluation and testing workflows

The core message of this conference was clear and powerful: exponential growth in AI capabilities is no longer the bottleneck — the real challenge is how quickly we put it to work. And developers are the key players in bridging that gap.