#Token

741 related articles

2026年5月30日·2 min

SGLang v0.5.12.post1 Released: DeepSeek V4 Stability Fixes and Blackwell Adaptation

SGLang v0.5.12.post1 stability patch details: 12 critical fixes covering DeepSeek V4 garbled text and crashes, NIXL PD disaggregated inference logic, Blackwell B300 adaptation, and cold start optimization.

Step 3.7 Flash: Deep Dive into the 198B Sparse MoE Multimodal Model

Tech Frontiers

2026年5月30日·2 min

Step 3.7 Flash: Deep Dive into the 198B Sparse MoE Multimodal Model

Deep dive into StepFun AI's Step 3.7 Flash, a 198B sparse MoE vision-language model with 256K context and 3-level reasoning, excelling in multimodal understanding, AI coding, and Agent tool orchestration.

LFM2.5-8B-A1B: A MoE Model with 1.5B Active Parameters Delivering 4x Its Weight Class Performance

Tech Frontiers

2026年5月30日·2 min

LFM2.5-8B-A1B: A MoE Model with 1.5B Active Parameters Delivering 4x Its Weight Class Performance

Liquid AI releases LFM2.5-8B-A1B, a MoE model with 8B total params but only 1.5B active, matching 6B-class models in tool calling. Supports 128K context, local deployment, multilingual, with SGLang Day-0 support.

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

Industry Insights

2026年5月30日·2 min

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

AMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.

Cloudflare Contributes Critical KV Cache and Mooncake Fixes to SGLang

Tech Frontiers

2026年5月30日·1 min

Cloudflare Contributes Critical KV Cache and Mooncake Fixes to SGLang

Cloudflare contributes decode KV cache offload and Mooncake recovery fixes to SGLang, resolving garbled output under high concurrency for Kimi K2.6 and enabling automatic fault recovery in distributed inference.

SGLang Hosts Agent Loops Office Hour, Focusing on Agentic Loop Architecture Optimization

Tech Frontiers

2026年5月30日·1 min

SGLang Hosts Agent Loops Office Hour, Focusing on Agentic Loop Architecture Optimization

SGLang team hosts an Agent Loops Office Hour exploring inference optimization for agentic loops, covering KV Cache reuse, low-latency multi-turn dialogue, and tool calling techniques.

Product Reviews

O3 vs Gemini 2.5 Pro vs Claude 3.7: Re…

2026年5月30日·3 min

O3 vs Gemini 2.5 Pro vs Claude 3.7: Real-World AI Coding Ability Comparison

Real-world comparison of O3, Gemini 2.5 Pro, and Claude 3.7 coding abilities through snake battles, RL training, solar system simulation, and soccer game tasks.

Product Reviews

Llama 3.3 70B In-Depth Review: Testing…

2026年5月30日·3 min

Llama 3.3 70B In-Depth Review: Testing the Strongest Open-Source LLM with 13 Questions

Meta releases Llama 3.3 70B open-source model with just 70B parameters rivaling 405B performance. Tested on 13 logic, math, and coding questions, it passed 12 — reshaping the open-source model landscape.

Product Reviews

Real-World Coding Test of 13 Top AI Mo…

2026年5月30日·3 min

Real-World Coding Test of 13 Top AI Models: Who Is the Best Programming Assistant?

Benchmark of 13 top AI models including GPT-4.1, Claude 3.7 Sonnet, and Gemini 2.5 Pro on coding ability, scored across 8 dimensions using the same high-difficulty algorithm problem.

Product Reviews

API Aggregation Proxy Platforms Tested…

2026年5月30日·2 min

API Aggregation Proxy Platforms Tested: One Interface to Call 100+ AI Models

Hands-on testing of an API aggregation proxy platform's model calling capabilities, including GPT-Image2 image generation, cost analysis, and coverage of 100+ models like Claude and Gemini.

Industry Insights

Six Foundational Upgrades to Claude Co…

2026年5月30日·3 min

Six Foundational Upgrades to Claude Code: AI Programming Moves from Lab to Industrial Scale

Anthropic's largest-ever foundational upgrade to Claude Code fixes six critical issues at once—terminal flickering, thinking freezes, cryptic errors, context deadlocks, unstable connections, and session crashes—shifting AI coding competition to the infrastructure layer.

Tutorials

BMad-Method: Building an AI Agile Deve…

2026年5月30日·3 min

BMad-Method: Building an AI Agile Development Team with a Multi-Agent Framework

Deep dive into BMad-Method, an open-source multi-agent framework simulating a full agile team—from business analysis to QA—supporting Claude Code, Cursor, and more.

Tutorials

Claude Code Source Code Study Guide: E…

2026年5月30日·3 min

Claude Code Source Code Study Guide: Efficiently Mastering Core AI Agent Development Architecture

Learn AI Agent development from Claude Code's 510K lines of source code, covering Agent Loop, context compression, multi-Agent orchestration, and two efficient study methods.

Tutorials

Claude Code Monitor Tool Explained: Ev…

2026年5月30日·2 min

Claude Code Monitor Tool Explained: Event-Driven Replaces Polling, Saving Tokens More Efficiently

Deep dive into Claude Code's new built-in Monitor tool. Learn how event-driven monitoring replaces polling via Stream Filter and Poll and Diff modes, dramatically reducing token consumption.

Tutorials

Low-Cost Solution for Using GPT Models…

2026年5月30日·3 min

Low-Cost Solution for Using GPT Models with Claude Code: Build an AI Programming Workflow for ~$1.50/Month

How to use ClipRoxyAPI local proxy to combine Claude Code's programming UX with GPT Codex Team models for under $1.50/month with ample quota and full privacy.

Product Reviews

Major Claude Code Update: A Complete G…

2026年5月30日·2 min

Major Claude Code Update: A Complete Guide to Agent View and the Goal System

Deep dive into Claude Code's new Agent View and Goal system, covering multi-agent parallel management, background sessions, and result-oriented autonomous execution.

Product Reviews

Unified Management Tool for Claude Cod…

2026年5月30日·2 min

Unified Management Tool for Claude Code and Codex: One-Click Multi-AI Programming Environment Setup

A deep dive into the unified management client for Claude Code and Codex, solving pain points like tedious configuration, high switching costs, and fragmented management with one-click setup and usage monitoring.

Product Reviews

Claude Code with MiniMax M2: Testing a…

2026年5月29日·3 min

Claude Code with MiniMax M2: Testing a Low-Cost AI Coding Solution Across Three Real Projects

Real-world testing of MiniMax M2 as Claude Code's backend model across three projects: framework migration, iOS development, and full-stack MVP — at just 8% of Claude's price.

Product Reviews

Deep Dive into Cursor's Pay-Per-Use Re…

2026年5月29日·3 min

Deep Dive into Cursor's Pay-Per-Use Refill Plan: Is Using Official Pro Accounts at 65% Off Reliable?

Deep analysis of Cursor's pay-per-use refill plugin: account rotation mechanism, tiered discounts, full model support, and objective assessment of compliance risks and data security concerns.

Industry Insights

AI Fully Automated Orchestration in Pr…

2026年5月29日·3 min

AI Fully Automated Orchestration in Practice: How Software Production Costs Are Being Completely Disrupted

Deep analysis of AI fully automated software orchestration: from Claude Code workflows to parallel orchestration strategies, exploring how models like MiniMax M1 drive software production costs toward zero.