XAI Releases Grok Build Programming Model, Gemini Spark Opens to Ultra Users

Overview

Several major AI updates were released simultaneously. XAI officially launched the public beta of its Grok Build 0.1 programming model, Google's Gemini Spark agent opened to all Ultra subscribers, OpenAI's Codex Computer Use feature arrived on Windows, Tongying Lab released a Vision-Language-Action model, and DeepSeek continued scaling back service features due to resource constraints.

此外,现在也可以在ChatGPT移动端访问Windows版本的Codex进行工作

搭配动作解码器,在多项基准测试中表现优秀,超越了最佳专用模型

在近日又取消了智能搜索功能

XAI Releases Grok Build 0.1 Programming Model

XAI announced that Grok Build 0.1 is now available as a public beta through the XAI API. This is the same model powering the Grok Build CLI (Command Line Interface — the most common programming interaction method for developers), focused on the coding domain and positioned as a cost-effective code generation tool. XAI chose to first let developers validate the model's capabilities in real coding scenarios through the CLI tool, then open the battle-tested model as an API — this "validate first, open later" approach helps build developer trust.

In terms of pricing, Grok Build 0.1 costs $1 per million input tokens and $2 per million output tokens, which is highly competitive in the current programming model market. For comparison, OpenAI's GPT-4o charges $2.5/M tokens for input, Anthropic's Claude 3.5 Sonnet charges $3/M tokens, and coding-focused models like Codestral fall in a similar price range. Grok Build's pricing is notably lower than mainstream competitors, reflecting XAI's strategy of using price advantages to rapidly capture the developer market. For developers, this means they can integrate professional programming capabilities into their applications and workflows at relatively low cost.

Notably, XAI's decision to open the CLI-validated model as an API service indicates that the model has been thoroughly tested in real programming scenarios and has achieved a certain level of maturity. The model is now available for use, and developers can access it directly through the API.

Google Gemini Spark Opens to Ultra Users

Google's previously introduced agent assistant, Gemini Spark, is now available to all Google AI Ultra subscribers in the United States. Google AI Ultra is Google's premium subscription plan priced at approximately $249.99/month, positioned as the highest-tier paid level offering the most powerful AI capabilities. The agent is positioned similarly to Anthropic's Claude, but its core differentiator lies in deep integration with the Google ecosystem.

AI agents differ fundamentally from traditional conversational AI: conversational AI passively responds to user questions, while agents can autonomously plan tasks, invoke tools, and continuously execute complex workflows. The release of Gemini Spark marks Google's official entry into the "always-on AI agent" competitive arena.

Core Features

Always-on operation: As an always-on AI assistant, it can continuously handle user tasks without requiring users to repeatedly initiate conversations
Ecosystem integration: Native-level support for Gmail, Google Calendar, Google Docs, Google Drive, and other mainstream Google tools, with the ability to directly read, edit, and manage user data across these platforms
Agent positioning: Not just a conversational AI, but an intelligent agent capable of proactively executing tasks and autonomously completing multi-step complex workflows

This strategy reflects Google's unique advantage in AI competition — leveraging its massive product ecosystem to form a moat. Google's product matrix covers virtually every aspect of users' digital lives. When an AI assistant can directly operate a user's email, calendar, documents, and other tools, it's no longer an isolated chat window but a digital assistant truly embedded in the user's workflow, with practical value far exceeding pure conversational capabilities.

OpenAI Codex Computer Use Arrives on Windows

OpenAI announced that the Computer Use feature in Codex is now available on Windows. The feature had previously been tested on macOS for some time with positive user feedback.

Computer Use is one of the most cutting-edge capabilities in the current AI Agent field. Its core concept is enabling AI to operate computers like humans — by observing screen content, moving the mouse, clicking buttons, and typing text to complete tasks. This concept first gained widespread attention when Anthropic introduced it alongside Claude 3.5 Sonnet in October 2024, with OpenAI, Google, and other companies following suit. Technically, Computer Use relies on multimodal models' visual understanding capabilities — the model needs to "read" UI elements in screenshots, understand the current application state, and then generate precise operation instructions (such as click coordinates, keyboard inputs, etc.).

Additionally, users can now access the Windows version of Codex through the ChatGPT mobile app, further expanding the feature's use cases. The expansion from macOS to Windows carries significant commercial implications: Windows holds approximately 72% of the global desktop OS market share, while macOS accounts for about 15%. Therefore, the Windows launch means Codex Computer Use's potential user base has expanded several-fold, covering the vast majority of desktop users.

Tongying Lab Releases Coin VLA Vision-Language-Action Model

Tongying Lab has released Coin VLA (Vision-Language-Action), a model focused on general embodied intelligence. Embodied AI refers to AI that exists not only in the digital world but can also interact with the real world through physical carriers (such as robots and robotic arms) — a key technological direction connecting AI with the physical world.

Technical Architecture

The model unifies different robotic tasks into three core steps:

Observation: Perceiving environmental information through cameras and other sensors to obtain visual input
Understanding: Performing reasoning based on language models, combining visual information with natural language instructions to form task comprehension
Action: Converting the language model's abstract representations into specific physical action parameters (such as joint angles, end-effector displacements) through a dedicated Action Decoder, outputting executable control instructions for robots

Traditional robot control typically trains specialized models for specific tasks, while the breakthrough of VLA models lies in unifying visual perception, language understanding, and action generation into a single end-to-end framework, achieving a closed loop of "see-think-do." The model uses Coin 3.54B (3.54 billion parameters) as its base, a lightweight architecture particularly important for scenarios requiring on-device deployment on robots, since robots' computational resources are typically far less than cloud servers. Paired with a dedicated action decoder, the model performs excellently across multiple benchmarks, surpassing previous best specialized models. Detailed technical reports, code, and papers have all been publicly released.

DeepSeek Continues Scaling Back Service Features

Constrained by resource limitations, DeepSeek's server-side Expert Mode has removed its smart search functionality, following the earlier removal of file upload capabilities. The company has not specified a timeline for restoration.

This situation reflects the compute supply-demand contradiction commonly faced across the AI industry. After DeepSeek released its R1 reasoning model in late 2024, its user base surged dramatically, but as a relatively young AI company, its GPU cluster scale is far smaller than giants like OpenAI and Google. Expert Mode typically invokes models with larger parameter counts or longer reasoning chains, with significantly higher computational costs than Quick Mode. Smart search requires the model to perform web retrieval and information synthesis before generating answers, further increasing compute consumption per request. Against the backdrop of U.S. chip export controls on China, Chinese AI companies face restricted access to high-end GPUs (such as NVIDIA H100/H200), making compute resource constraints even more pronounced.

However, it's worth noting that Quick Mode still supports all these features without impact. DeepSeek's choice to prioritize maintaining full functionality in Quick Mode is essentially a prioritization trade-off under limited resources — ensuring that most users' basic experience remains unaffected. How to balance service quality with user scale remains a core challenge they need to address.

Summary

Looking at today's developments, competition in AI programming tools is intensifying (XAI and OpenAI both making moves on the same day), AI agents are evolving from conversation to actual operation (Gemini Spark, Codex Computer Use), and the embodied intelligence field continues to advance. Each company is seeking differentiated competitive paths: XAI enters the developer market with price advantages, Google builds a moat through ecosystem integration, OpenAI expands its user base through cross-platform coverage, and Tongying Lab seeks breakthroughs in cutting-edge embodied intelligence technology. Developers and users will continue to benefit from this competition.

XAI Releases Grok Build Programming Model, Gemini Spark Opens to Ultra Users

Overview

XAI Releases Grok Build 0.1 Programming Model

Google Gemini Spark Opens to Ultra Users

Core Features

OpenAI Codex Computer Use Arrives on Windows

Tongying Lab Releases Coin VLA Vision-Language-Action Model

Technical Architecture

DeepSeek Continues Scaling Back Service Features

Summary

Key Takeaways

Related articles

Huawei HDC Deep Dive: Pangu 2.0 Goes Open Source and HarmonyOS 7 Agents Reshape the Mobile AI Ecosystem

Simon Willison Updates WebRTC Voice Tool: Now Supports Document Context Conversations and GPT-Realtime-2

Wise Large Transfer Delayed Two Weeks: How Should Cross-Border Entrepreneurs Respond?