XAI Releases Grok Build Programming Model, Gemini Spark Opens to Ultra Users

XAI launches Grok Build coding model, Gemini Spark opens to Ultra users, Codex Computer Use hits Windows.
XAI released Grok Build 0.1, a cost-effective coding model at $1/M input tokens. Google opened Gemini Spark agent to AI Ultra subscribers with deep ecosystem integration. OpenAI brought Codex Computer Use to Windows, expanding coverage to 72% of desktop users. Tongying Lab released Coin VLA for embodied intelligence, while DeepSeek continued cutting Expert Mode features due to compute constraints.
Overview
Several major AI updates were released simultaneously. XAI officially launched the public beta of its Grok Build 0.1 programming model, Google's Gemini Spark agent opened to all Ultra subscribers, OpenAI's Codex Computer Use feature arrived on Windows, Tongying Lab released a Vision-Language-Action model, and DeepSeek continued scaling back service features due to resource constraints.



XAI Releases Grok Build 0.1 Programming Model
XAI announced that Grok Build 0.1 is now available as a public beta through the XAI API. This is the same model powering the Grok Build CLI (Command Line Interface — the most common programming interaction method for developers), focused on the coding domain and positioned as a cost-effective code generation tool. XAI chose to first let developers validate the model's capabilities in real coding scenarios through the CLI tool, then open the battle-tested model as an API — this "validate first, open later" approach helps build developer trust.
In terms of pricing, Grok Build 0.1 costs $1 per million input tokens and $2 per million output tokens, which is highly competitive in the current programming model market. For comparison, OpenAI's GPT-4o charges $2.5/M tokens for input, Anthropic's Claude 3.5 Sonnet charges $3/M tokens, and coding-focused models like Codestral fall in a similar price range. Grok Build's pricing is notably lower than mainstream competitors, reflecting XAI's strategy of using price advantages to rapidly capture the developer market. For developers, this means they can integrate professional programming capabilities into their applications and workflows at relatively low cost.
Notably, XAI's decision to open the CLI-validated model as an API service indicates that the model has been thoroughly tested in real programming scenarios and has achieved a certain level of maturity. The model is now available for use, and developers can access it directly through the API.
Google Gemini Spark Opens to Ultra Users
Google's previously introduced agent assistant, Gemini Spark, is now available to all Google AI Ultra subscribers in the United States. Google AI Ultra is Google's premium subscription plan priced at approximately $249.99/month, positioned as the highest-tier paid level offering the most powerful AI capabilities. The agent is positioned similarly to Anthropic's Claude, but its core differentiator lies in deep integration with the Google ecosystem.
AI agents differ fundamentally from traditional conversational AI: conversational AI passively responds to user questions, while agents can autonomously plan tasks, invoke tools, and continuously execute complex workflows. The release of Gemini Spark marks Google's official entry into the "always-on AI agent" competitive arena.
Core Features
- Always-on operation: As an always-on AI assistant, it can continuously handle user tasks without requiring users to repeatedly initiate conversations
- Ecosystem integration: Native-level support for Gmail, Google Calendar, Google Docs, Google Drive, and other mainstream Google tools, with the ability to directly read, edit, and manage user data across these platforms
- Agent positioning: Not just a conversational AI, but an intelligent agent capable of proactively executing tasks and autonomously completing multi-step complex workflows
This strategy reflects Google's unique advantage in AI competition — leveraging its massive product ecosystem to form a moat. Google's product matrix covers virtually every aspect of users' digital lives. When an AI assistant can directly operate a user's email, calendar, documents, and other tools, it's no longer an isolated chat window but a digital assistant truly embedded in the user's workflow, with practical value far exceeding pure conversational capabilities.
OpenAI Codex Computer Use Arrives on Windows
OpenAI announced that the Computer Use feature in Codex is now available on Windows. The feature had previously been tested on macOS for some time with positive user feedback.
Computer Use is one of the most cutting-edge capabilities in the current AI Agent field. Its core concept is enabling AI to operate computers like humans — by observing screen content, moving the mouse, clicking buttons, and typing text to complete tasks. This concept first gained widespread attention when Anthropic introduced it alongside Claude 3.5 Sonnet in October 2024, with OpenAI, Google, and other companies following suit. Technically, Computer Use relies on multimodal models' visual understanding capabilities — the model needs to "read" UI elements in screenshots, understand the current application state, and then generate precise operation instructions (such as click coordinates, keyboard inputs, etc.).
Additionally, users can now access the Windows version of Codex through the ChatGPT mobile app, further expanding the feature's use cases. The expansion from macOS to Windows carries significant commercial implications: Windows holds approximately 72% of the global desktop OS market share, while macOS accounts for about 15%. Therefore, the Windows launch means Codex Computer Use's potential user base has expanded several-fold, covering the vast majority of desktop users.
Tongying Lab Releases Coin VLA Vision-Language-Action Model
Tongying Lab has released Coin VLA (Vision-Language-Action), a model focused on general embodied intelligence. Embodied AI refers to AI that exists not only in the digital world but can also interact with the real world through physical carriers (such as robots and robotic arms) — a key technological direction connecting AI with the physical world.
Technical Architecture
The model unifies different robotic tasks into three core steps:
- Observation: Perceiving environmental information through cameras and other sensors to obtain visual input
- Understanding: Performing reasoning based on language models, combining visual information with natural language instructions to form task comprehension
- Action: Converting the language model's abstract representations into specific physical action parameters (such as joint angles, end-effector displacements) through a dedicated Action Decoder, outputting executable control instructions for robots
Traditional robot control typically trains specialized models for specific tasks, while the breakthrough of VLA models lies in unifying visual perception, language understanding, and action generation into a single end-to-end framework, achieving a closed loop of "see-think-do." The model uses Coin 3.54B (3.54 billion parameters) as its base, a lightweight architecture particularly important for scenarios requiring on-device deployment on robots, since robots' computational resources are typically far less than cloud servers. Paired with a dedicated action decoder, the model performs excellently across multiple benchmarks, surpassing previous best specialized models. Detailed technical reports, code, and papers have all been publicly released.
DeepSeek Continues Scaling Back Service Features
Constrained by resource limitations, DeepSeek's server-side Expert Mode has removed its smart search functionality, following the earlier removal of file upload capabilities. The company has not specified a timeline for restoration.
This situation reflects the compute supply-demand contradiction commonly faced across the AI industry. After DeepSeek released its R1 reasoning model in late 2024, its user base surged dramatically, but as a relatively young AI company, its GPU cluster scale is far smaller than giants like OpenAI and Google. Expert Mode typically invokes models with larger parameter counts or longer reasoning chains, with significantly higher computational costs than Quick Mode. Smart search requires the model to perform web retrieval and information synthesis before generating answers, further increasing compute consumption per request. Against the backdrop of U.S. chip export controls on China, Chinese AI companies face restricted access to high-end GPUs (such as NVIDIA H100/H200), making compute resource constraints even more pronounced.
However, it's worth noting that Quick Mode still supports all these features without impact. DeepSeek's choice to prioritize maintaining full functionality in Quick Mode is essentially a prioritization trade-off under limited resources — ensuring that most users' basic experience remains unaffected. How to balance service quality with user scale remains a core challenge they need to address.
Summary
Looking at today's developments, competition in AI programming tools is intensifying (XAI and OpenAI both making moves on the same day), AI agents are evolving from conversation to actual operation (Gemini Spark, Codex Computer Use), and the embodied intelligence field continues to advance. Each company is seeking differentiated competitive paths: XAI enters the developer market with price advantages, Google builds a moat through ecosystem integration, OpenAI expands its user base through cross-platform coverage, and Tongying Lab seeks breakthroughs in cutting-edge embodied intelligence technology. Developers and users will continue to benefit from this competition.
Key Takeaways
Related articles

Huawei HDC Deep Dive: Pangu 2.0 Goes Open Source and HarmonyOS 7 Agents Reshape the Mobile AI Ecosystem
Huawei HDC unveils Pangu 2.0 full open source and HarmonyOS 7 system-level Agent capabilities. Deep analysis of sparse architecture efficiency, on-device 30B models, and the Agent gateway battle.

Simon Willison Updates WebRTC Voice Tool: Now Supports Document Context Conversations and GPT-Realtime-2
Simon Willison updates his OpenAI WebRTC voice tool with document context support and GPT-Realtime-2, enabling low-latency voice conversations grounded in specific documents.

Wise Large Transfer Delayed Two Weeks: How Should Cross-Border Entrepreneurs Respond?
Wise Business users face 10-14 day delays on large transfers, sparking debate on whether fintech is repeating traditional banking mistakes. Analysis and practical tips for cross-border entrepreneurs.