Sakana AI Releases Fugu Ultra: How Model Orchestration Achieves Frontier AI Performance

Sakana AI's New Breakthrough: The Fugu Ultra Model

Japanese AI company Sakana AI recently released its latest model, Fugu Ultra, claiming it matches top-tier models like Anthropic's Fable 5 and Mythos Preview on the industry's most rigorous benchmarks in engineering, science, and reasoning. The core highlight of this release lies not just in performance itself, but in its unique technical approach—Autonomous Model Orchestration.

Sakana AI was founded in 2023 by David Ha, a former Google Brain researcher, and Llion Jones, a former Google DeepMind researcher. Notably, Llion Jones is one of the eight authors of the groundbreaking paper Attention Is All You Need—the paper that introduced the Transformer architecture and laid the technical foundation for all modern large language models. The company name "Sakana" comes from the Japanese word for "fish" (魚), drawing inspiration from the collective intelligence of fish schools: individual fish follow simple rules, yet the group exhibits complex and efficient coordinated behavior. This naming philosophy deeply reflects the company's core technical philosophy—achieving intelligence that surpasses any single complex system through the collaborative work of multiple relatively simple components.

Sakana Fugu Ultra Release

What Is Autonomous Model Orchestration?

Unlike the traditional approach of training a single large model, Sakana AI has chosen a differentiated technical path. The core idea behind "autonomous model orchestration" is: rather than training an all-purpose model from scratch, an intelligent system autonomously dispatches and coordinates multiple specialized models to achieve frontier-level performance as a whole.

To understand the innovation of this approach, it helps to first understand several related technical paradigms in the current AI landscape. Model Routing refers to distributing requests to the model best suited to handle a given task based on input characteristics—for example, routing math problems to a math-specialized model and code problems to a programming-specialized model. Mixture of Experts (MoE) sets up multiple "expert" sub-networks within a single model, dynamically activating the most relevant expert modules through a gating mechanism to handle different types of input. Google's Switch Transformer and the Mixtral model both employ this architecture.

Sakana AI's "autonomous model orchestration" pushes these ideas to a higher level of automation and abstraction. Unlike MoE, which fixes the expert structure during model training, an autonomous orchestration system can dynamically select, combine, and even chain-call completely independently trained external models at inference time. This means the orchestration system must decide not only "which model to use" but also "how to decompose the task," "in what order to call models," and "how to integrate outputs from multiple models"—requiring meta-cognitive planning and decision-making capabilities. Fugu Ultra essentially plays the role of a "commander"—as its release title states: "One Model to Command Them All."

Strategic Significance of This Technical Approach

This architectural design brings several notable advantages:

Modularity and Flexibility: Underlying specialized models can be independently updated and replaced without retraining the entire system. This is similar to the microservices architecture philosophy in software engineering—splitting a monolithic application into independently deployable service modules, each able to evolve independently using the most suitable technology stack
Resource Efficiency: It avoids the enormous compute investment required to train a single ultra-large-scale model. For reference, training a GPT-4-class model is estimated to require tens of thousands of A100/H100 GPUs running for months, costing over $100 million. A model orchestration approach can leverage existing open-source or commercial models as components, dramatically reducing upfront investment
Rapid Iteration: New specialized capabilities can be quickly acquired by integrating new models, rather than waiting for the next large-scale training run. When a stronger specialized model emerges in a given domain, the orchestration system can incorporate it into its capability portfolio almost instantly

Frontier Performance and Export Controls: A Dual Consideration

In its release statement, Sakana AI specifically emphasized a key message: Fugu Ultra can "deliver frontier capabilities without export control risk." This statement carries profound geopolitical context.

A Technical Path Around Chip Restrictions

US export controls on advanced AI chips have become a significant variable in the global AI industry landscape. This control policy began with a series of rules issued by the US Department of Commerce's Bureau of Industry and Security (BIS) in October 2022, initially targeting China and restricting exports of high-end AI training chips like NVIDIA's A100 and H100. The scope of controls has since expanded continuously: updated rules in October 2023 further tightened restrictions on China and extended controls to more countries and regions; the "AI Diffusion Rule" released in January 2025 established a global three-tier classification system—categorizing countries worldwide into "unrestricted," "conditional access," and "prohibited" tiers. Even traditional allies like Japan must follow specific licensing procedures for acquiring large-scale compute clusters.

Although Japan is a US ally placed on the Tier 1 preferred nation list, it still faces supply chain uncertainties in AI compute access. Global GPU supply has been tight for an extended period, and even without policy restrictions, Japanese companies are at a disadvantage when competing with US tech giants for limited chip production capacity. Moreover, the rapidly changing geopolitical environment means today's policy preferences cannot guarantee tomorrow's continued supply.

Sakana AI's model orchestration approach reduces dependence on single ultra-large-scale compute clusters at the architectural level, strategically providing a viable alternative path for Japan and other regions affected by export controls. Traditional large model training requires concentrating thousands or even tens of thousands of top-tier GPUs in a single data center, working together through high-speed interconnects (such as NVLink and InfiniBand). The compute requirements for model orchestration are more distributed and flexible—the orchestration layer itself doesn't require massive training compute, and underlying specialized models can leverage existing open-source models or call commercial models via API.

Put simply, even without access to the most cutting-edge AI training chips, it may still be possible to achieve performance comparable to frontier models through clever model orchestration and combination. This could have far-reaching implications for the global AI competitive landscape—it suggests an "asymmetric competition" strategy: rather than competing head-on in the compute arms race, achieving breakthroughs through architectural innovation.

Benchmark Performance

According to Sakana AI's official statement, Fugu Ultra demonstrates outstanding performance on benchmarks in the following areas:

Engineering Benchmarks: Covering engineering practice capabilities such as code generation and system design. Common engineering benchmarks include HumanEval (testing code generation correctness), SWE-bench (testing the ability to solve real software engineering tasks), and others
Science Benchmarks: Including mathematical reasoning and scientific knowledge comprehension. Typical benchmarks include MATH (competition-level math problems), GPQA (graduate-level science Q&A), MMLU (Massive Multitask Language Understanding), and others
Reasoning Benchmarks: Testing complex logical reasoning and multi-step problem-solving capabilities. Representative benchmarks include ARC-AGI (Abstract Reasoning Challenge), BBH (BIG-Bench Hard), and others

It's worth noting that AI benchmarking itself is a highly contested field. The industry has long faced the problem of "benchmark overfitting"—model developers may intentionally or unintentionally optimize for specific benchmarks, leading to inflated benchmark scores without corresponding improvements in actual capabilities. Additionally, the vast differences in evaluation methods across benchmarks (such as few-shot vs. zero-shot, whether chain-of-thought reasoning is allowed, temperature parameter settings, etc.) make fair cross-model comparisons difficult. In recent years, the community has increasingly valued "arena"-style blind evaluation methods (such as Chatbot Arena), which assess models' actual performance through real users' blind preference voting.

The official positioning compares Fugu Ultra against Anthropic's Fable 5 and Mythos Preview, both widely recognized as top-tier models in the industry. If Fugu Ultra's benchmark data withstands third-party verification, this would represent a major victory for the model orchestration technical approach.

Industry Impact and Outlook

Implications for AI Development Paradigms

Sakana AI's success (if verified) could push more teams to reconsider the inertial thinking that "you must train larger models." Against the backdrop of continuously rising compute costs—with next-generation frontier model training costs estimated to potentially exceed $1 billion—model orchestration offers a more cost-effective way to achieve frontier capabilities. This resonates with several recent trends in AI: smaller models acquiring large model capabilities through distillation, the rise of test-time compute, and Agent frameworks extending model capability boundaries through tool use—all challenging the traditional belief of "scaling is all you need" at different levels.

Technical Lineage of Evolutionary Computation and Nature-Inspired AI

Sakana AI has consistently featured "evolutionary computation" and "nature-inspired AI" as its technical hallmark—this isn't merely brand narrative but has deep academic roots. Evolutionary Computation is a family of optimization algorithms inspired by Darwin's theory of natural selection, including Genetic Algorithms, Evolution Strategies, and Genetic Programming. In AI model development, evolutionary methods can be used for Neural Architecture Search (NAS), hyperparameter optimization, and even the merging and evolution of model weights. In Sakana AI's previously published research, they demonstrated how "Model Merging" technology—combining the weights of multiple trained models in specific ways—can create new models that outperform the originals, with evolutionary algorithms automatically searching for optimal merging strategies. Fugu Ultra's model orchestration capabilities are very likely built upon this research foundation, representing a technical evolution from static model merging to dynamic model orchestration.

Questions Worth Watching

Of course, several key questions still require ongoing observation:

Latency and Cost: Does multi-model orchestration introduce significant inference latency and cost increases? In actual deployment, each additional model call means additional network latency and computational overhead. If a user request requires task decomposition by the orchestration layer, serial or parallel calls to multiple specialized models, and final result integration, total latency could be far higher than direct inference from a single model. For applications requiring real-time responses (such as conversation, code completion), this could be a critical bottleneck
Third-Party Verification: Can official benchmark data be reproduced in independent evaluations? The AI field has seen multiple instances of "excellent benchmark scores but mediocre real-world performance," and rigorous evaluation by independent third parties will be key to verifying Fugu Ultra's true capabilities
Real-World Application Performance: There is often a gap between benchmark scores and user experience in real scenarios. Benchmarks are typically structured problems with clear answers, while real user needs are often ambiguous, multi-turn, and require understanding context and intent
Orchestration System Robustness: How does the orchestration system gracefully degrade when an underlying specialized model produces errors or becomes unavailable? The failure modes of multi-model systems are more complex than single models, with errors potentially propagating and amplifying across models

As an AI company founded in Japan by former Google researchers, Sakana AI has consistently featured evolutionary computation and nature-inspired AI as its technical hallmark. The release of Fugu Ultra marks an important leap from research exploration to actual product delivery. Regardless of ultimate market feedback, its model orchestration approach deserves deep attention from the entire industry—it raises a fundamental question: on the road to artificial general intelligence, is "training a bigger model" the only answer?