Qwen Launches 400+ New Features as Wenxin 5.0 and Multiple LLMs Drop Simultaneously

On January 16, 2025, the AI industry saw a wave of major updates: Alibaba's Qwen APP underwent a massive upgrade, Baidu released a new version of its Wenxin (ERNIE) model, and Meituan and StepFun also delivered significant products. This article covers the day's key developments and analyzes the latest progress from each company.

Qwen APP Comprehensive Upgrade: 400+ New Features Connecting the Alibaba Ecosystem

The Qwen APP update is arguably the largest in its history, launching over 400 new features at once. The most notable addition is the Task Assistant feature, covering most daily use scenarios including office work, consulting, research, application development, and lifestyle entertainment.

Qwen's "Task Assistant" is essentially a real-world implementation of AI Agent technology. An AI Agent refers to an AI system capable of perceiving its environment, autonomously planning, and executing multi-step tasks — fundamentally different from traditional single-turn Q&A models. Agent systems typically possess Tool Use capabilities, memory management, and task decomposition abilities, breaking complex goals into a series of subtasks and executing them sequentially. After integrating with Alipay, Taobao, and other ecosystem services, Qwen effectively gains the ability to "operate in the real world." Technically, this relies on the Function Calling mechanism — the model can identify user intent and automatically invoke corresponding APIs to complete actions like placing orders and making payments, without requiring users to manually switch between apps.

More critically, Qwen announced full integration with Alipay, Taobao, Amap (Gaode Maps), and other Alibaba ecosystem services. After integration, users can directly order food delivery, shop, book flights, and more through Qwen, creating a true AI shopping assistant. This "super entry point" approach is highly similar to Apple's strategy of deeply integrating Siri with the iOS ecosystem, but leveraging Alibaba's massive e-commerce and payment infrastructure, its commercialization potential is far more direct. This means Qwen is evolving from a conversational AI tool into a super intelligent entry point capable of orchestrating Alibaba's service ecosystem.

The feature is currently open for testing to all users, with a detailed feature list available on Qwen APP's WeChat official account. This move reflects Alibaba's strategic intent to deeply embed large models into its commercial ecosystem — when AI can directly help users place orders and make purchases, its commercial value far exceeds that of simple information Q&A.

Wenxin ERNIE 5.0 Version 0110 Released

Baidu released version 0110 of its Wenxin ERNIE 5.0 model. On the LMARENA text leaderboard, this version ranks 8th globally, demonstrating strong overall capabilities.

LMARENA (formerly Chatbot Arena) is a large model evaluation platform created by a UC Berkeley team that uses "human preference voting" to rank models rather than relying on fixed standardized test sets. Its core mechanism lets real users blindly compare responses from two models without knowing their identities, then calculates comprehensive rankings through an Elo rating system (derived from chess competition ranking algorithms). This evaluation method is considered more reflective of model performance in real-world usage scenarios than traditional benchmarks, as it circumvents the "benchmark gaming" problem — models cannot overfit to human preference data in a targeted way. ERNIE 5.0's 8th-place global ranking on this leaderboard means its overall conversational quality has been validated by a large number of real users, rather than excelling only on specific academic datasets.

Wenxin ERNIE 5.0 Version 0110 Released

This upgrade focuses on strengthening two core capabilities: mathematical reasoning and coding ability. Additionally, professional capabilities across multiple vertical domains have been significantly enhanced, including business, finance, and medicine. This indicates Baidu is pushing Wenxin from general capabilities toward deep penetration into specialized fields. The model is currently available for direct experience on the Wenxin official website.

New Moves from Meituan and OpenAI

Meituan Releases LongChat Flash Thinking 260B

Meituan launched the LongChat Flash Thinking 260B model, featuring deep reasoning capabilities and ranking highly across multiple benchmarks.

Ranking highly across multiple benchmarks

The "260B" in the model name refers to 260 billion parameters, placing it in the ultra-large language model category. Notably, modern ultra-large parameter models typically don't activate all parameters simultaneously but instead employ MoE (Mixture of Experts) architecture — activating only a small subset of "expert" sub-networks during each inference pass, dramatically reducing per-inference computational costs while maintaining high total parameter counts. GPT-4, Mixtral, and other mainstream large models use similar architectures. The "deep thinking" capability typically corresponds to Chain-of-Thought or more advanced "slow thinking" reasoning mechanisms, where the model generates extensive intermediate reasoning steps before providing a final answer — similar to a human's scratch work when solving problems — significantly improving accuracy on complex questions.

The model is currently available for online experience through the LongChat official website and also offers API access for developer integration. As a local life services giant, Meituan's continued investment in large model R&D represents technical capabilities that should not be underestimated.

GPT-5.2 Codex Launches on Responses API

OpenAI announced that GPT-5.2 Codex is now available through the Responses API, priced the same as GPT-5.2. Developers can call it directly using any IDE or agentic coding framework, further lowering the barrier to AI-powered programming.

StepFun's Speech Model Tops Global Rankings

StepFun's open-source language model Step-2-DOR 1.1 ranked first globally on the authoritative Artificial Analysis Speech Reasoning leaderboard. This benchmark is one of the most authoritative third-party evaluations for assessing native speech models in the industry.

StepFun's speech model ranks first globally on authoritative leaderboard

The "Speech Reasoning" track that Step-2-DOR 1.1 competes in refers to a model's ability to directly process speech input and perform logical reasoning, as opposed to the traditional two-stage approach of "speech-to-text then reasoning." Native speech reasoning models can capture intonation, pauses, emotions, and other information lost in text transcription, offering significant advantages in real-time conversation and voice assistant scenarios. Artificial Analysis is a third-party organization focused on AI model performance benchmarking, with evaluation dimensions covering inference speed, output quality, price-performance ratio, and more. It has gained industry recognition for its transparent methodology and continuously updated data.

Step-2-DOR 1.1 surpassed mainstream frontline models including Grok, Gemini, and GPT Real-Time with a 96.4% accuracy rate, setting a new all-time record. This means StepFun has achieved globally optimal performance across the complete end-to-end speech understanding and reasoning pipeline, serving as an important benchmark for Chinese AI teams achieving breakthroughs in specialized technical tracks. The model's full API will officially launch in February this year.

AI LLM Market Landscape: Anthropic Closing in on Google

Looking at AI model market share data, the landscape has shifted noticeably in recent times:

OPER 4.5 usage surged dramatically, with a single-day increase of 59%, growing by over 70 billion tokens
Hardcore 4.5 and GPT OSS 120B also showed impressive growth rates
Grok 4.1 Fast and Gemini 2.0 Flash experienced notable declines

AI LLM market share changes, growth exceeding 70 billion tokens

At the aggregate level, Anthropic's market share has exceeded 20% and is closing in on Google. Founded in 2021 by former OpenAI Research VP Dario Amodei and others, Anthropic focuses on AI safety research and large model development. Its flagship Claude series is known for its "Constitutional AI" training method — having the model self-critique and self-correct based on a set of predefined principles, resulting in outstanding performance in safety and instruction following. Several key factors drive Anthropic's rapid market share growth: Claude's strong reputation in code generation and long-context processing, Amazon AWS's strategic investment and deep integration (Claude has become one of the core models on the AWS Bedrock platform), and enterprise customers' demand for supplier diversification beyond OpenAI.

This trend is worth watching — powered by the Claude series' excellent performance, Anthropic is capturing increasing market share from both OpenAI and Google. The competitive landscape for AI large models is accelerating from a duopoly toward a multipolar structure, with profound implications for the entire industry's pricing strategies and technical roadmap competition.

Summary

The AI industry developments on January 16 reveal several clear trends: First, large models are moving from "capability demonstration" to "ecosystem integration" — Qwen's integration with Alibaba services is a prime example. Second, competition in vertical capabilities is intensifying — whether it's Wenxin's improvements in specialized domains or StepFun's breakthrough in speech reasoning, differentiated competition beyond general capabilities has begun. Third, the market landscape continues to shift rapidly, with Anthropic's rise disrupting the previous duopoly structure.

Key Takeaways

Qwen APP launched 400+ new features with full integration into Alipay, Taobao, Amap, and other Alibaba ecosystem services, supporting real operations like AI shopping
Baidu's Wenxin ERNIE 5.0 version 0110 released, ranking 8th globally on LMARENA with significantly enhanced math and coding capabilities
StepFun's Step-2-DOR 1.1 topped global speech reasoning rankings with 96.4% accuracy on the authoritative leaderboard
Anthropic's market share broke 20% and is closing in on Google, accelerating the reshaping of the AI LLM competitive landscape
GPT-5.2 Codex launched on Responses API; Meituan released LongChat Flash Thinking 260B deep reasoning model

Qwen Launches 400+ New Features as Wenxin 5.0 and Multiple LLMs Drop Simultaneously

Qwen APP Comprehensive Upgrade: 400+ New Features Connecting the Alibaba Ecosystem

Wenxin ERNIE 5.0 Version 0110 Released

New Moves from Meituan and OpenAI

Meituan Releases LongChat Flash Thinking 260B

GPT-5.2 Codex Launches on Responses API

StepFun's Speech Model Tops Global Rankings

AI LLM Market Landscape: Anthropic Closing in on Google

Summary

Key Takeaways

Related articles

GitHub Agent HQ Launch: AI Coding Tools Enter the Era of Platform Competition

Gemini 3.5 Flash Achieves a Massive Leap on the GDPval Benchmark

Google Gemini Antigravity Weekly Quota Tripled — AI Coding Without Limits