GPT-5.6 Internal Testing Begins: A Complete Breakdown of the Week's Biggest AI Developments

OpenAI Accelerates Iteration: GPT-5.6 Internal Testing Underway

Just three weeks after the release of GPT-5.5, OpenAI has launched internal testing of GPT-5.6. The new version introduces an "UltraFast" mode that boosts inference speed by 2 to 3 times. Such significant improvements in inference speed typically involve a combination of technical approaches: Speculative Decoding uses a smaller model to predict the larger model's output tokens to reduce computation; KV Cache optimization reduces redundant calculations; model distillation compresses a large model's capabilities into a smaller architecture; and hardware-level operator fusion and memory optimization. OpenAI previously demonstrated a "trade thinking time for quality" strategy in its o-series models, and UltraFast mode may represent exploration in the opposite direction—maximizing speed while maintaining quality.

More notably, the Codex model has for the first time participated in its own training process, signaling that an AI self-improvement closed loop is taking shape. Traditional large model training relies on human-annotated data and Reinforcement Learning from Human Feedback (RLHF). Having an AI model participate in its own training—so-called "Recursive Self-Improvement"—means the model can generate training data, evaluate output quality, and even optimize its own training pipeline. Once this closed loop forms, it could theoretically enable exponential capability gains, but it also raises deep concerns in the Alignment field: if AI deviates from human intent during self-improvement, the window for course correction shrinks dramatically.

bilibili source: GPT-5.6 Revealed

OpenAI's iteration cadence has compressed from quarterly to weekly, and this "rapid release, rapid iteration" strategy is clearly aimed at maintaining its lead in an intensely competitive market.

Codex Goal-Driven Mode: AI Programming Shifts from Generation to Delivery

An Innovative Architecture with Small Models as Judges

Codex has introduced a goal-driven mode, with the core idea originating from three lines of Bash script—using a small model as a "judge" to determine whether a task is complete, refusing to stop until it is. This design philosophy marks AI programming's shift from "generating code" to "closed-loop delivery."

Using a small model as a judge is essentially a layered verification architecture. In software engineering, this is analogous to automated testing in Continuous Integration/Continuous Deployment (CI/CD), but upgrades the test judgment from rule-driven to AI-driven. Small models (lightweight models with parameters in the billions) have low running costs and fast response times, making them suitable for high-frequency completion assessments; while large models handle the actual code generation and problem-solving. This architecture avoids the self-evaluation bias problem of having the large model serve as both player and referee, while also controlling overall inference costs.

Three companies followed suit with this approach within 11 days, indicating the industry has reached consensus: future AI programming tools won't just write code—they need to ensure the code actually solves the problem.

Stunning AI Programming Efficiency Data

A real-world test showed that work requiring 80 hours from a PhD student was completed by AI in just two hours—a 40x efficiency improvement. In another case, someone used AI to rewrite 960,000 lines of C code into Rust in 6 days. While the version compiled successfully, over 13,000 Unsafe calls sparked safety concerns.

Unsafe calls in Rust mean bypassing Rust's prized ownership system and borrow checker—the very core safety advantages Rust has over C. Over 13,000 Unsafe calls mean this rewrite completed the language conversion at a syntactic level but didn't truly achieve Rust's memory safety guarantees. This reflects a fundamental limitation of current AI code generation: AI excels at pattern matching and syntax conversion, but its deep understanding of program semantics—particularly involving complex concepts like concurrency safety and lifetime management—remains insufficient. A genuine C-to-Rust migration requires redesigning the data ownership model, not simple line-by-line translation. This reminds us: AI's speed advantage is certain, but quality control still requires human involvement.

MiniMax Dual Breakthrough: Model Compression and Multi-Agent Systems

Style Elastic Technology: Deployment Costs Slashed 360x

MiniMax has introduced Style Elastic technology, achieving a 360x reduction in model compression costs with virtually no loss in accuracy. Mainstream techniques in model compression include Quantization (reducing parameter precision), Pruning (removing redundant parameters), Knowledge Distillation (using large models to teach small models), and Low-Rank Factorization. Style Elastic technology achieving a 360x cost reduction with near-zero accuracy loss likely combines dynamic inference path selection—adaptively adjusting computation based on input complexity, routing simple tasks through lightweight paths and complex tasks through full paths. This elastic computing strategy has precedent in Mixture of Experts (MoE) architectures, but a 360x compression ratio suggests potentially more aggressive architectural innovations.

This technology makes large model deployment significantly more economically viable, with particular significance for small and medium-sized enterprises.

Mavis Adversarial Multi-Agent System

MiniMax's Mavis system employs an adversarial architecture where three roles check and balance each other, supporting parallel task execution and automatic error correction. The adversarial architecture draws from game theory's concept of checks and balances. A typical three-role design might include: an Executor (responsible for completing tasks), a Reviewer (responsible for finding errors and vulnerabilities), and an Arbiter (responsible for final decisions). This design solves the "self-confirmation bias" problem of single Agents—one Agent struggles to find its own errors, but another Agent specifically designed to find errors can. Similar thinking is reflected in AlphaGo's self-play, and in GANs' (Generative Adversarial Networks) generator-discriminator architecture. The key challenges in multi-Agent collaboration lie in communication efficiency and consistency maintenance, avoiding infinite loops of argumentation between Agents.

The system supports integration with WeChat, Feishu, and other platforms. The Client SDK is open-sourced, upgrading from a command-line tool to a complete intelligent agent platform supporting multi-agent team collaboration and scheduled tasks. It became the hottest open-source code tool on GitHub on its launch day.

The Valuation Battle: Anthropic vs OpenAI

The Business Game Behind the Numbers

Anthropic's valuation target has reached $950 billion, with annualized revenue of $44 billion. The company claims its market share has surpassed OpenAI and is conducting the largest fundraising round in history. However, OpenAI promptly accused Anthropic of using the "gross method" to inflate revenue, claiming its $8 billion in actual annualized revenue is only $22 billion, below OpenAI's $25 billion.

The Gross Method and Net Method are two approaches to revenue recognition in accounting standards. The gross method records the full transaction amount as revenue, while the net method only counts net income after deducting costs. For example, if an AI company sells API services through a cloud platform, the gross method would count the entire fee paid by users as revenue, while the net method might only count the portion after deducting compute costs. In SaaS and platform economics, which method is used can create multifold differences in revenue figures. OpenAI's accusation that Anthropic uses the gross method essentially questions the substance behind its revenue numbers—a common offensive-defensive strategy in pre-IPO valuation battles among tech companies.

This numbers war is essentially a pre-IPO valuation battle. Both companies are preparing for public listings, and whoever can tell a better story to capital markets will secure higher valuation multiples.

Cerebras IPO and Shifts in AI Hardware Landscape

Cerebras went public at $185 per share, exceeding market expectations, raising $5.55 billion with a valuation of $56.4 billion, listed on NASDAQ under the ticker CBRS. Cerebras's core technology is the Wafer-Scale Engine (WSE), which fabricates an entire silicon wafer into a single chip rather than the traditional approach of dicing it into hundreds of smaller chips. Its latest WSE-3 features 4 trillion transistors and 900,000 AI cores, with an area 56 times larger than NVIDIA's H100. This design eliminates inter-chip communication bottlenecks and is particularly suited for large-scale matrix operations in large model inference. The $56.4 billion valuation reflects market expectations that NVIDIA's GPU monopoly could potentially be disrupted, although Cerebras still lags far behind the CUDA ecosystem in ecosystem maturity.

Interestingly, OpenAI had previously invested at an extremely low price and has now reaped substantial returns.

On a related note, Cisco's Q3 revenue hit a record $15.84 billion, AI orders were raised to $9 billion, and shares surged 19% after hours—but the company simultaneously laid off 4,000 employees. The AI displacement effect is manifesting within hardware giants themselves.

Figure Humanoid Robot: 8-Hour Autonomous Operation Validated

Figure's humanoid robot completed 8 hours of continuous autonomous sorting operations using the Helix 02 model, processing approximately one package every 3 seconds, with capabilities for collaborative battery swapping and self-diagnostics. Helix is Figure's proprietary end-to-end neural network model that directly maps visual input to robot action output, bypassing the traditional robotics approach of separated perception-planning-control architecture. The significance of 8 hours of continuous autonomous operation lies in validating system robustness—during extended operation, the robot must handle various edge cases: variations in item shapes, conveyor belt speed fluctuations, self-battery management, and more. While the approximately 3-second-per-package speed is slower than dedicated sorting robotic arms (typically under 1 second), the advantage of humanoid robots lies in versatility—the same hardware platform can adapt to different scenarios without requiring custom equipment for each task.

This represents one of the longest autonomous operation validations for humanoid robots in logistics scenarios.

The 2nd International Humanoid Robot Exhibition will be held at the Hangzhou Convention and Exhibition Center in May 2026, with a 60,000 square meter exhibition area expected to attract 100,000 professional visitors, covering the entire industry chain. Tencent's CorePro-powered home robots have seen daily interaction time surge from 30 minutes to 2 hours. AI Agents give robots "family-like" attributes, with enormous commercial potential.

AI Video Generation: Google Vio 3.1 Takes a Strong Lead

Google's Vio 3.1 supports 4K resolution and native audio generation, leading China's Kling 3.0 in image quality and features, while offering an extremely competitive low-price strategy. Meanwhile, OpenAI's Sora has ceased service due to excessively high generation costs.

Additionally, Google's new Gemini Omni model has leaked, capable of simultaneously generating images, video, and audio, and is expected to be officially announced at next week's I/O conference. Traditional multimodal AI systems typically train independent models for different modalities (text, image, video, audio) and then chain them together through pipelines. Unified generation models use a single architecture to handle all modalities simultaneously, with the core idea of encoding data from different modalities into a shared Latent Space. The advantage of this approach is cross-modal consistency—generated video and audio are naturally synchronized, and image style naturally matches text descriptions. The technical challenge lies in the vast differences in information density across modalities: one second of video contains far more information than one sentence of text. How to balance computational resource allocation across modalities within a unified architecture is a key challenge. Multimodal unified generation is becoming the next competitive focal point.

Industry Personnel Moves and Funding News

The company founded by former Meta scientist Yuandong Tian has completed a $650 million funding round at a $4.65 billion valuation, led by GV and Greycroft with AMD and NVIDIA participating, focusing on large-scale self-improvement. Eight AI leaders have joined forces, explicitly opposing the approach of blindly scaling compute.

During his court testimony, Sam Altman revealed that Elon Musk once wanted his children to inherit OpenAI, and characterized Musk as "not understanding how to run a lab," saying his "chainsaw management" style damaged employee morale. The details of this legal battle continue to provide the industry with entertaining conversation fodder.

Key Takeaways

GPT-5.6 entered internal testing just three weeks after GPT-5.5's release, introducing UltraFast mode with 2-3x speed improvement; Codex participates in its own training for the first time
MiniMax launched Style Elastic technology achieving 360x model compression cost reduction, while releasing the Mavis adversarial multi-Agent system with open-source SDK
Anthropic and OpenAI engage in an offensive-defensive battle over revenue data and valuation, with Anthropic targeting a $950 billion valuation
Cerebras IPO priced at $185/share with a $56.4 billion valuation; Figure humanoid robot completes 8 hours of continuous autonomous operation
AI programming shifts from code generation to closed-loop delivery, with Codex's goal-driven mode sparking industry-wide adoption