Qwen Launches 400+ New Features as Wenxin 5.0 and Multiple LLMs Drop Simultaneously

Jan 16, 2025 sees major AI releases from Qwen, ERNIE, StepFun, and shifting market dynamics.
On January 16, 2025, the AI industry saw a wave of major updates. Alibaba's Qwen APP launched 400+ new features with full integration into Alipay, Taobao, and other ecosystem services, creating an AI super entry point. Baidu's ERNIE 5.0 new version ranked 8th globally on LMARENA with enhanced math and coding abilities. StepFun's speech model topped global rankings with 96.4% accuracy. Anthropic's market share broke 20%, closing in on Google, as the LLM competitive landscape accelerates toward multipolarity.
On January 16, 2025, the AI industry saw a wave of major updates: Alibaba's Qwen APP underwent a massive upgrade, Baidu released a new version of its Wenxin (ERNIE) model, and Meituan and StepFun also delivered significant products. This article covers the day's key developments and analyzes the latest progress from each company.
Qwen APP Comprehensive Upgrade: 400+ New Features Connecting the Alibaba Ecosystem
The Qwen APP update is arguably the largest in its history, launching over 400 new features at once. The most notable addition is the Task Assistant feature, covering most daily use scenarios including office work, consulting, research, application development, and lifestyle entertainment.
Qwen's "Task Assistant" is essentially a real-world implementation of AI Agent technology. An AI Agent refers to an AI system capable of perceiving its environment, autonomously planning, and executing multi-step tasks — fundamentally different from traditional single-turn Q&A models. Agent systems typically possess Tool Use capabilities, memory management, and task decomposition abilities, breaking complex goals into a series of subtasks and executing them sequentially. After integrating with Alipay, Taobao, and other ecosystem services, Qwen effectively gains the ability to "operate in the real world." Technically, this relies on the Function Calling mechanism — the model can identify user intent and automatically invoke corresponding APIs to complete actions like placing orders and making payments, without requiring users to manually switch between apps.
More critically, Qwen announced full integration with Alipay, Taobao, Amap (Gaode Maps), and other Alibaba ecosystem services. After integration, users can directly order food delivery, shop, book flights, and more through Qwen, creating a true AI shopping assistant. This "super entry point" approach is highly similar to Apple's strategy of deeply integrating Siri with the iOS ecosystem, but leveraging Alibaba's massive e-commerce and payment infrastructure, its commercialization potential is far more direct. This means Qwen is evolving from a conversational AI tool into a super intelligent entry point capable of orchestrating Alibaba's service ecosystem.
The feature is currently open for testing to all users, with a detailed feature list available on Qwen APP's WeChat official account. This move reflects Alibaba's strategic intent to deeply embed large models into its commercial ecosystem — when AI can directly help users place orders and make purchases, its commercial value far exceeds that of simple information Q&A.
Wenxin ERNIE 5.0 Version 0110 Released
Baidu released version 0110 of its Wenxin ERNIE 5.0 model. On the LMARENA text leaderboard, this version ranks 8th globally, demonstrating strong overall capabilities.
LMARENA (formerly Chatbot Arena) is a large model evaluation platform created by a UC Berkeley team that uses "human preference voting" to rank models rather than relying on fixed standardized test sets. Its core mechanism lets real users blindly compare responses from two models without knowing their identities, then calculates comprehensive rankings through an Elo rating system (derived from chess competition ranking algorithms). This evaluation method is considered more reflective of model performance in real-world usage scenarios than traditional benchmarks, as it circumvents the "benchmark gaming" problem — models cannot overfit to human preference data in a targeted way. ERNIE 5.0's 8th-place global ranking on this leaderboard means its overall conversational quality has been validated by a large number of real users, rather than excelling only on specific academic datasets.

This upgrade focuses on strengthening two core capabilities: mathematical reasoning and coding ability. Additionally, professional capabilities across multiple vertical domains have been significantly enhanced, including business, finance, and medicine. This indicates Baidu is pushing Wenxin from general capabilities toward deep penetration into specialized fields. The model is currently available for direct experience on the Wenxin official website.
New Moves from Meituan and OpenAI
Meituan Releases LongChat Flash Thinking 260B
Meituan launched the LongChat Flash Thinking 260B model, featuring deep reasoning capabilities and ranking highly across multiple benchmarks.

The "260B" in the model name refers to 260 billion parameters, placing it in the ultra-large language model category. Notably, modern ultra-large parameter models typically don't activate all parameters simultaneously but instead employ MoE (Mixture of Experts) architecture — activating only a small subset of "expert" sub-networks during each inference pass, dramatically reducing per-inference computational costs while maintaining high total parameter counts. GPT-4, Mixtral, and other mainstream large models use similar architectures. The "deep thinking" capability typically corresponds to Chain-of-Thought or more advanced "slow thinking" reasoning mechanisms, where the model generates extensive intermediate reasoning steps before providing a final answer — similar to a human's scratch work when solving problems — significantly improving accuracy on complex questions.
The model is currently available for online experience through the LongChat official website and also offers API access for developer integration. As a local life services giant, Meituan's continued investment in large model R&D represents technical capabilities that should not be underestimated.
GPT-5.2 Codex Launches on Responses API
OpenAI announced that GPT-5.2 Codex is now available through the Responses API, priced the same as GPT-5.2. Developers can call it directly using any IDE or agentic coding framework, further lowering the barrier to AI-powered programming.
StepFun's Speech Model Tops Global Rankings
StepFun's open-source language model Step-2-DOR 1.1 ranked first globally on the authoritative Artificial Analysis Speech Reasoning leaderboard. This benchmark is one of the most authoritative third-party evaluations for assessing native speech models in the industry.

The "Speech Reasoning" track that Step-2-DOR 1.1 competes in refers to a model's ability to directly process speech input and perform logical reasoning, as opposed to the traditional two-stage approach of "speech-to-text then reasoning." Native speech reasoning models can capture intonation, pauses, emotions, and other information lost in text transcription, offering significant advantages in real-time conversation and voice assistant scenarios. Artificial Analysis is a third-party organization focused on AI model performance benchmarking, with evaluation dimensions covering inference speed, output quality, price-performance ratio, and more. It has gained industry recognition for its transparent methodology and continuously updated data.
Step-2-DOR 1.1 surpassed mainstream frontline models including Grok, Gemini, and GPT Real-Time with a 96.4% accuracy rate, setting a new all-time record. This means StepFun has achieved globally optimal performance across the complete end-to-end speech understanding and reasoning pipeline, serving as an important benchmark for Chinese AI teams achieving breakthroughs in specialized technical tracks. The model's full API will officially launch in February this year.
AI LLM Market Landscape: Anthropic Closing in on Google
Looking at AI model market share data, the landscape has shifted noticeably in recent times:
- OPER 4.5 usage surged dramatically, with a single-day increase of 59%, growing by over 70 billion tokens
- Hardcore 4.5 and GPT OSS 120B also showed impressive growth rates
- Grok 4.1 Fast and Gemini 2.0 Flash experienced notable declines

At the aggregate level, Anthropic's market share has exceeded 20% and is closing in on Google. Founded in 2021 by former OpenAI Research VP Dario Amodei and others, Anthropic focuses on AI safety research and large model development. Its flagship Claude series is known for its "Constitutional AI" training method — having the model self-critique and self-correct based on a set of predefined principles, resulting in outstanding performance in safety and instruction following. Several key factors drive Anthropic's rapid market share growth: Claude's strong reputation in code generation and long-context processing, Amazon AWS's strategic investment and deep integration (Claude has become one of the core models on the AWS Bedrock platform), and enterprise customers' demand for supplier diversification beyond OpenAI.
This trend is worth watching — powered by the Claude series' excellent performance, Anthropic is capturing increasing market share from both OpenAI and Google. The competitive landscape for AI large models is accelerating from a duopoly toward a multipolar structure, with profound implications for the entire industry's pricing strategies and technical roadmap competition.
Summary
The AI industry developments on January 16 reveal several clear trends: First, large models are moving from "capability demonstration" to "ecosystem integration" — Qwen's integration with Alibaba services is a prime example. Second, competition in vertical capabilities is intensifying — whether it's Wenxin's improvements in specialized domains or StepFun's breakthrough in speech reasoning, differentiated competition beyond general capabilities has begun. Third, the market landscape continues to shift rapidly, with Anthropic's rise disrupting the previous duopoly structure.
Key Takeaways
- Qwen APP launched 400+ new features with full integration into Alipay, Taobao, Amap, and other Alibaba ecosystem services, supporting real operations like AI shopping
- Baidu's Wenxin ERNIE 5.0 version 0110 released, ranking 8th globally on LMARENA with significantly enhanced math and coding capabilities
- StepFun's Step-2-DOR 1.1 topped global speech reasoning rankings with 96.4% accuracy on the authoritative leaderboard
- Anthropic's market share broke 20% and is closing in on Google, accelerating the reshaping of the AI LLM competitive landscape
- GPT-5.2 Codex launched on Responses API; Meituan released LongChat Flash Thinking 260B deep reasoning model
Related articles
Tech FrontiersGitHub Agent HQ Launch: AI Coding Tools Enter the Era of Platform Competition
GitHub Universe unveils Agent HQ platform for unified coding agent management, Copilot upgrades with multi-model support. OpenAI completes restructuring, Anthropic tests new model, NVIDIA open-sources AI models.
Tech FrontiersGemini 3.5 Flash Achieves a Massive Leap on the GDPval Benchmark
Google Gemini 3.5 Flash surpasses Gemini 3.1 Pro on the GDPval benchmark. The lightweight Flash model leverages post-training techniques to approach frontier-level performance, redefining the balance between quality and cost.
Tech FrontiersGoogle Gemini Antigravity Weekly Quota Tripled — AI Coding Without Limits
Google Gemini triples Antigravity weekly quotas following a prior daily quota boost. Analyzing the impact on developers and its strategic significance in AI coding.