xAI Hiring Chinese-Speaking Trainers at $45/Hour, OpenAI Rebuilds Robotics Team

xAI hires Chinese trainers at $45/hr for Grok; OpenAI re-enters robotics; enterprise AI costs spiral.
xAI is recruiting native Chinese-speaking AI Tutors at $35–45/hour to train Grok's multilingual voice capabilities. OpenAI officially re-established its robotics team to pursue embodied AI. Microsoft plans to unveil a proprietary coding model at Build to reduce reliance on OpenAI. Meanwhile, a U.S. company's $500M accidental AI bill highlights the urgent need for enterprise cost governance.
Overview
On June 1, 2025, several major AI stories broke simultaneously: OpenAI officially formed a robotics team, xAI opened remote training positions for native Chinese speakers, Microsoft is set to release a proprietary coding model, and a U.S. company accidentally racked up $500 million in cloud AI costs in a single month due to lack of controls. Here's a detailed breakdown of each development.



OpenAI Establishes Robotics Team, Re-enters Embodied AI
OpenAI announced the official launch of OpenAI Robotics, along with engineering recruitment. The team will focus on hardware and machine learning co-design to advance robotics research, marking OpenAI's renewed push into embodied AI.
Embodied AI refers to the integration of AI systems into physical entities (such as robots), enabling them to perceive, understand, and interact with the physical world. Unlike purely digital AI, embodied AI must solve the closed-loop problem of perception-decision-execution, tackling multidimensional challenges including visual understanding, force feedback, and motion planning. The core bottleneck in this field has long been that traditional robot control relies on hand-coded rules and limited perception capabilities, making it difficult to handle complex tasks in open environments. Since 2023, breakthroughs in multimodal large models' visual understanding and reasoning capabilities have led the industry to broadly agree that using large models as a robot's "brain" is now feasible.
Interestingly, OpenAI had a robotics research division early on, which was disbanded in 2021. The dissolution was primarily due to the high cost of hardware iteration and insufficient model capabilities to support complex physical interaction tasks. The decision to rebuild the team now clearly reflects the enormous potential that improved large model capabilities bring to robot control. Combined with the recent industry buzz around humanoid robots — including rapid progress from companies like Figure, 1X Technologies, and Tesla Optimus — OpenAI's move likely aims to bring its powerful multimodal model capabilities into the physical world, creating a complete technology loop from perception to decision-making to execution.
xAI Hiring Chinese AI Tutors: Grok Multilingual Training Role Explained
xAI (Elon Musk's AI company) posted a remote position recruiting native Chinese-speaking AI Tutors to train Grok's multilingual audio capabilities. Key details:
- Work format: Remote, supporting full-time, part-time, or contract arrangements
- Hourly rate: $35–$45 (approximately ¥250–325 RMB/hour)
- Core responsibility: Training Grok's Chinese voice capabilities
The AI Tutor role is essentially participation in the model's Reinforcement Learning from Human Feedback (RLHF) pipeline. In this process, human annotators evaluate, rank, or directly provide high-quality demonstration data for model outputs, helping the model learn human preferences. For multilingual audio training, AI Tutors need to assess the model's Chinese speech recognition accuracy, tonal naturalness, dialect adaptability, and more, while providing corrective guidance. These roles require annotators to have both native-level proficiency in the target language and a clear understanding of AI system capabilities and limitations, which is why the hourly rate is significantly higher than typical data labeling work.
This hiring move signals that xAI is actively expanding Grok's multilingual ecosystem, particularly for the Chinese market. As one of the world's most widely spoken languages, Chinese voice processing faces unique challenges including tone recognition, homophone disambiguation, and dialect diversity — all requiring extensive native speaker involvement in training data quality control. For native Chinese speakers with AI domain knowledge, this represents a noteworthy remote work opportunity.
Microsoft to Unveil Proprietary Coding Model at Build Conference
Microsoft plans to release a proprietary coding model at its upcoming Build developer conference. The new model will be used directly to enhance GitHub Copilot's competitiveness.
GitHub Copilot is currently the market-leading AI coding assistant, with over 15 million monthly active users across scenarios including code completion, chat-based programming assistance, and code review. Its core capabilities depend on the underlying language model's ability to understand and generate code. GitHub Copilot currently relies primarily on OpenAI's models (including GPT-4 and specially fine-tuned Codex series), so Microsoft developing its own coding model signals a push for greater autonomy in the AI programming tools space.
Although Microsoft is OpenAI's largest investor (with cumulative investment exceeding $13 billion), over-reliance on a single model provider in commercial competition poses strategic risks — including cost control, responsiveness to customization needs, and supply chain security. A proprietary coding model enables Microsoft to deeply optimize for specific scenarios like code completion, debugging, and refactoring while reducing OpenAI API call costs. This also reflects a broader industry trend: major tech companies increasingly prefer to control core model capabilities rather than fully depend on third-party providers. Google, Meta, Amazon, and others are all following similar paths.
xAI Video Generation Model Tops Rankings; Apple On-Device AI Upgrade Imminent
Grok Imagine Video 1.5 Preview Surpasses Sora
xAI launched Grok Imagine Video 1.5 Preview, which supports 720P video generation and has surpassed Sora 2.0 to claim the top spot on relevant leaderboards.
Video generation is one of the most challenging frontiers in generative AI, requiring models to simultaneously handle temporal consistency (ensuring coherence between frames), physics simulation (gravity, collisions, fluid dynamics), and motion coherence (natural, fluid character movements). When Sora was released in early 2024, it stunned the industry with its ability to simulate the physical world, but Runway Gen-3, Kling, Pika, and other companies quickly followed. Video generation model evaluation typically covers image quality, motion fluidity, text alignment, temporal consistency, and other dimensions. Support for 720P resolution indicates the model has reached a level of commercial application potential. Competition in the video generation space is intensifying rapidly.
Apple to Showcase Local AI Models at WWDC
Apple is expected to showcase on-device AI upgrades at next month's WWDC. Reports indicate its local model technology stems from a collaboration with Google (Gemini technology transfer), with complex queries routed to Google's cloud for processing.
This "edge-cloud collaborative" architecture balances privacy protection with computational power. Specifically, lightweight models run locally on the device to handle routine tasks (such as text summarization, simple Q&A, image understanding, etc.), while only queries exceeding local computing capacity are uploaded to the cloud. The core advantages of this design are: local processing ensures sensitive user data never leaves the device, fulfilling Apple's longstanding privacy commitments; cloud processing provides complex reasoning capabilities that on-device chips (A-series/M-series) cannot support alone. Apple's Private Cloud Compute framework, introduced at WWDC 2024, already laid the security foundation for this architecture — ensuring that even when data reaches the cloud, it is processed in an encrypted environment without persistent storage, striking a balance between functionality and privacy.
Enterprise AI Costs Spiral Out of Control: The $500 Million Warning
According to Axios, an unnamed U.S. company accidentally spent $500 million on cloud AI services in a single month, simply because no usage limits were set for employees. This case has sparked widespread discussion about enterprise AI cost management.
The root causes of enterprise AI cost overruns typically span multiple layers: lack of API call volume caps, absence of department-level budget allocation, employees indiscriminately using high-cost models (such as GPT-4-tier, where a single call can cost dozens of times more than lower-tier models) for low-value tasks, and infinite loop calls in automated workflows. Take GPT-4 as an example — its API pricing is approximately $30 per million input tokens and $60 per million output tokens. If thousands of employees make unrestricted, frequent calls, costs can indeed grow exponentially in a short period.
As AI tools proliferate across enterprises, the lack of usage policies and budget controls can lead to astronomical bills. A mature enterprise AI governance framework should encompass four layers: Usage policies (clearly defining which business scenarios are appropriate for AI and which model tier to use), Permission tiers (different roles mapped to different models and call quotas), Real-time monitoring (setting cost threshold alerts with automatic circuit breakers when budgets are exceeded), and Regular audits (analyzing usage efficiency and ROI). While embracing AI, enterprises must make AI cost management a core agenda item in IT governance and financial controls.
Other Updates
- SoftBank Group plans to invest up to €75 billion (approximately $87 billion) in France to build AI data center infrastructure. This investment scale reflects the explosive growth in global AI computing demand — training and running large models requires massive GPU clusters, and data center construction typically takes 2–3 years. Countries are engaged in fierce competition over computing infrastructure.
- Open Cloud released its 2026.5.31 Beta 1 version, strengthening recovery capabilities after abnormal interruptions and improving tool call interruption handling, canvas binding, and multi-channel delivery features.
Takeaways
Today's news reveals several clear trends: First, AI giants are extending into robotics and embodied intelligence, seeking to expand digital intelligence capabilities into the physical world. Second, multilingual capability building has become a new battleground in model competition, with the Chinese market — given its scale and complexity — becoming a must-win territory for all players. Third, enterprise AI governance and cost management have become unavoidable realities, with the pace of technology deployment far outstripping the development of management frameworks. Technological progress and governance standards must evolve in tandem — otherwise, the value AI delivers may be consumed by runaway costs and risks.
Related articles

A Gen-Z Woman Making $1.5M/Month: Deconstructing the Growth Methodology Behind AI Apps
Gen-Z indie dev Nicole built 4 hit AI apps earning $1.5M/mo. Deep dive into her industrialized UGC engine, traffic testing system, and minimalist tech stack.

Replit's AI Loops Workflow Explained: Multi-Agent Collaboration Replaces Prompt Engineering
Deep dive into Replit's AI Loops workflow: how orchestrators, parallel agents, and Computer Use Verifiers build automated closed-loop systems through multi-agent collaboration.

Claude Code + Skills: A Practical Guide to AI-Powered Test Case Generation
Learn how to use Claude Code + Skills to auto-generate enterprise-grade test cases. Covers AI Agent vs LLM differences, the four core capabilities, and the complete workflow from requirements to test cases.