AI Weekly: Claude Code Review, Gemma 4 Leak & DeepSeek V4 Delayed

Overview

It was a packed week in AI: Anthropic launched Claude Code's code review feature, Google's Gemma 4 model was accidentally leaked, DeepSeek V4's release was delayed again, and Microsoft's Copilot Cowork is reshaping collaboration. This article provides an in-depth analysis of each major update and its real-world impact on developers and the industry.

Anthropic Claude Code: AI Code Review Has Arrived

Anthropic has maintained a rapid pace of product iteration. Following Claude Code and the desktop app, the company has now rolled out its biggest update yet — code review functionality.

The core mechanism works like this: when a Pull Request is opened, multiple AI agents analyze the code in parallel, verify potential issues to filter out false positives, rank defects by severity, and deliver high-value feedback in the form of summaries and inline comments. Review depth automatically adjusts based on the complexity of the changes — larger PRs receive deeper analysis, while smaller ones get a lightweight review.

To understand the value of this feature, it helps to appreciate the central role Pull Requests play in modern software development. PRs are a Git-based collaboration mechanism: when a developer finishes code changes, they submit a PR requesting the team to review and merge the changes. Traditional code review relies on manual line-by-line inspection, where reviewers need to understand business logic, check for security vulnerabilities, assess code style consistency, and evaluate performance impact. This process typically takes hours or even days, and review quality is highly dependent on the reviewer's experience and energy level. In large teams, PR backlogs and review fatigue are common bottlenecks — AI code review tools are designed precisely to address this pain point.

Issues that even experienced engineers might miss

Cost and Controversy

The feature currently takes about 20 minutes on average, with a single review costing roughly $15 to $25. This pricing has sparked considerable debate, with many developers considering it too expensive. The market already offers lower-cost code review alternatives such as CodeGrab, GrapTile, and even Cognition's Devin.

That said, Anthropic's internal data is quite compelling: after deploying the system, the review feedback adoption rate jumped from 16% to 54%, helping developers catch issues that even experienced engineers might overlook. The feature is currently in research preview, available only to team users, with plans to gradually roll it out to more Claude Code users.

Google Gemma 4: A New Breakthrough in Open-Source LLMs

A rather dramatic leak revealed that Google is about to release the Gemma 4 model. Developers discovered a direct reference from a Google bot account in a Pull Request on a GitHub repository. Although the PR was quickly closed and renamed to cover the tracks, the news had already spread.

Model Specs and Architecture Highlights

According to the leaked information, Gemma 4 will use a Mixture of Experts (MoE) architecture:

Total parameters: ~120 billion
Active parameters: ~15 billion

Mixture of Experts (MoE) is a cutting-edge approach to scaling models through conditional computation. Unlike traditional dense models (such as early GPT series) that activate all parameters during every inference pass, MoE models distribute parameters across multiple "expert" sub-networks and use a Gating Network to dynamically select only a small subset of experts for each inference. This means a model can have a massive total parameter count representing its knowledge capacity, while the actual computational cost is proportional only to the active parameters. Google's Switch Transformer and the open-source community's Mixtral are successful precedents of the MoE architecture.

For Gemma 4, the design of 120 billion total parameters but only 15 billion active parameters means developers could potentially run a system with knowledge capacity comparable to a hundred-billion-parameter model on just one or two consumer-grade GPUs. As an open-source model, this would be a game-changing breakthrough — enabling more developers and enterprises to deploy high-performance AI models locally without relying on expensive cloud computing. Multiple Google team members have hinted that the model could launch soon.

DeepSeek V4: What's Behind the Delayed Release

DeepSeek's fourth-generation model was originally expected to launch in March, but it now appears to be pushed back further. Judging from the frequent integration updates and PR merges across multiple GitHub repositories, the underlying system is largely ready.

It features a one-million-token context window

Known Technical Features

Pre-release leaks reveal several key features of DeepSeek V4:

1 million token context window
Dynamic Absorption Attention architecture (related implementations are already available on GitHub)
Significant improvements in frontend code handling and user-generated content capabilities
Performance exceeding multiple existing proprietary models

A 1-million-token context window is a revolutionary technical breakthrough. The context window refers to the maximum text length a large language model can process in a single inference pass. Early GPT-3.5 supported only 4,096 tokens (roughly 3,000 English words), while 1 million tokens means the model can process approximately 750,000 English words at once — equivalent to a dozen complete books or the entire source code of a large codebase. This has transformative implications for long-document analysis, cross-file code comprehension, and complex conversational memory. The technical challenge of achieving ultra-long context lies in the fact that the computational complexity of standard attention mechanisms grows quadratically with sequence length, requiring innovative architectures such as sparse attention, linear attention, or hierarchical compression to reduce computational overhead.

Dynamic Absorption Attention is a novel attention mechanism variant designed to address this very challenge. It adaptively "absorbs" or compresses less important contextual information, allowing the model to drastically reduce computation while retaining critical information. This architecture can dynamically adjust the granularity of attention allocation based on the semantic importance of content, making it particularly well-suited for handling million-token-scale ultra-long context scenarios.

Analysis of the Delay

Some analysts believe that OpenAI released an advanced model during DeepSeek's planned launch window, which may have forced DeepSeek to readjust its release strategy. Industry insider Chris suggests that DeepSeek wants this release to not just meet the baseline but exceed expectations, hence the decision to delay until later this month or even next month.

Microsoft Copilot Cowork: Redefining AI Collaboration

Microsoft has significantly elevated Copilot's capabilities with the launch of Copilot Cowork. This system closely mirrors Anthropic's multi-agent philosophy and runs on top of the Microsoft 365 ecosystem.

Multi-agent systems represent a major trend in current AI application architecture. The core idea is to decompose complex tasks across multiple specialized AI agents working in concert, rather than relying on a single model to handle everything. Each agent can have different tool access permissions, domain expertise, and action capabilities. By embedding this concept into an enterprise-grade productivity suite, Microsoft is signaling that AI is no longer just a question-answering assistant — it's a "digital colleague" capable of autonomously executing workflows across multiple applications. The key challenges of this architecture lie in permission management, operation auditability, and error recovery mechanism design.

The core value of Copilot Cowork is this: you no longer need to manually switch between apps. Instead, you can hand tasks directly to Cowork. It formulates an execution plan based on your needs and automatically carries out operations across your apps and files — organizing schedules, adjusting meetings, managing documents, coordinating workflows, and even generating presentations and follow-up notes.

All operations are grounded in your organization's business data and run within Microsoft 365's existing security and governance framework. Cowork is currently being tested with a small number of customers, with a broader preview expected to roll out in the coming weeks.

Other Notable Developments

OpenAI Acquires PromptFool to Strengthen AI Safety

According to OpenAI

OpenAI announced the acquisition of PromptFool — a widely popular open-source red teaming tool. According to OpenAI, this technology will bolster its capabilities in safety testing and evaluation, particularly for increasingly powerful agent-based systems. The good news is that PromptFool will remain open-source under its existing license.

Red teaming originates from military terminology, referring to a dedicated team simulating adversary attacks to test weaknesses in defense systems. In AI safety, red teaming uses carefully crafted adversarial prompts to test whether AI models produce harmful outputs, leak training data, or bypass safety guardrails. PromptFool allows researchers to systematically probe model vulnerabilities, including prompt injection attacks, jailbreak techniques, and indirect prompt attacks. As AI agent systems gain increasing autonomous capabilities — such as web access, code execution, and file manipulation — the importance of safety evaluation grows exponentially. A compromised AI agent can cause far more damage than a simple chatbot.

Grok Imagine 1.5 Image Generation Upgrade

This could launch very soon

Elon Musk hinted on X that Grok Imagine 1.5 is in development. Some users have pointed out that Grok's image model is one of the few that maintains style consistency across various sizes and resolutions, and the new version could bring even more significant improvements.

OpenClaw Continues Iterating

This open-source local AI agent shipped two consecutive version updates, adding CP provenance tracking, a backup system, over a dozen security fixes, and support for GBC 5.4 and Gemini 3.1. It also optimized Docker multi-stage builds and the pluggable context engine.

Gemini Simple Mode Lowers the Barrier to Entry

Google introduced a simple mode for Gemini, activated by pressing Tab twice, which strips the interface down to just a single input box. This change targets a broader non-technical user base and effectively lowers the barrier to using AI tools.

Conclusion

This week's AI developments reveal several clear trends: fierce competition in AI-powered development tools (Claude Code vs. Copilot vs. Devin), continued breakthroughs in open-source LLMs (Gemma 4, DeepSeek V4), and unprecedented attention to AI safety evaluation. Major players are shifting from pure model capability competition toward comprehensive ecosystem and workflow integration.

Key Takeaways

Anthropic launched Claude Code's code review feature with multiple AI agents analyzing PRs in parallel; internal adoption rate jumped from 16% to 54%, though the $15–25 per-review cost has sparked debate
Google's Gemma 4 model was accidentally leaked, featuring a MoE architecture with 120B total / 15B active parameters that could potentially run on low-cost hardware
DeepSeek V4's release has been delayed; known features include a 1-million-token context window and Dynamic Absorption Attention architecture, with the delay likely driven by competitive pressure
Microsoft's Copilot Cowork creates an autonomous work layer running on the Microsoft 365 ecosystem for cross-application automated collaboration
OpenAI acquired red teaming tool PromptFool to strengthen AI safety evaluation; the tool will remain open-source