#MoE architecture

35 related articles

2026年5月30日·2 min

LFM2.5-8B-A1B: A MoE Model with 1.5B Active Parameters Delivering 4x Its Weight Class Performance

Liquid AI releases LFM2.5-8B-A1B, a MoE model with 8B total params but only 1.5B active, matching 6B-class models in tool calling. Supports 128K context, local deployment, multilingual, with SGLang Day-0 support.

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

Industry Insights

2026年5月30日·2 min

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

AMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.

Cloudflare Contributes Critical KV Cache and Mooncake Fixes to SGLang

Tech Frontiers

2026年5月30日·1 min

Cloudflare Contributes Critical KV Cache and Mooncake Fixes to SGLang

Cloudflare contributes decode KV cache offload and Mooncake recovery fixes to SGLang, resolving garbled output under high concurrency for Kimi K2.6 and enabling automatic fault recovery in distributed inference.

Industry Insights

Deep Dive into Three Major LLM Career …

2026年5月29日·3 min

Deep Dive into Three Major LLM Career Paths: Requirements, Tech Stacks, and Career Prospects

Deep analysis of three core LLM roles—Application Engineer, Development Engineer, and Algorithm Engineer—covering technical requirements, salary thresholds, and career prospects including RAG, fine-tuning, and inference deployment.

Tutorials

DeepSeek V3 + bolt.html: A Practical G…

2026年5月29日·2 min

DeepSeek V3 + bolt.html: A Practical Guide to Generating Beautiful Web Pages with Zero Code

Learn how DeepSeek V3-0324 and open-source tool bolt.html combine to generate beautiful HTML pages with zero code using prompt engineering techniques.

Tutorials

Why Qwen3 Is the Best Open-Source Mode…

2026年5月28日·2 min

Why Qwen3 Is the Best Open-Source Model for MCP Agent Development

Analysis of Qwen3's advantages for MCP agent development, comparing DeepSeek R1's lack of Function Calling, covering MoE architecture and thinking mode switching.

Tech Frontiers

June AI Showdown: Mythos, Sonnet 4.8, …

2026年5月28日·3 min

June AI Showdown: Mythos, Sonnet 4.8, and GPT-5.6 All Revealed

June 2025 becomes AI's densest release month: Anthropic Mythos nears launch, Claude Sonnet/Opus 4.8 skip-level upgrades, GPT-5.6 rapid iteration, DeepSeek V4 Pro permanent 75% price cut.

DeepSeek V4-Pro Permanent Price Cut: Lower Developer Costs as LLM Price War Heats Up

Tech Frontiers

2026年5月28日·1 min

DeepSeek V4-Pro Permanent Price Cut: Lower Developer Costs as LLM Price War Heats Up

DeepSeek announces permanent discount pricing for its V4-Pro model. Learn how this impacts developers, V4-Pro's competitive edge, and the latest LLM price war trends.

AI Agent Development Methodology: A Complete Guide from ReAct to Enterprise-Grade Tech Stack

Deep Dives

2026年5月28日·2 min

AI Agent Development Methodology: A Complete Guide from ReAct to Enterprise-Grade Tech Stack

A deep dive into AI Agent development methodology, from the ReAct theoretical framework to a four-layer enterprise tech stack covering model services, Agent types, LangChain, and production deployment.

GLM5 Architecture Leaked: 745B Parameters, DeepSeek V4 May Launch Quantized Smaller Model First

Tech Frontiers

2026年5月27日·2 min

GLM5 Architecture Leaked: 745B Parameters, DeepSeek V4 May Launch Quantized Smaller Model First

GLM5 code leak reveals 745B-parameter MoE architecture replicating DeepSeek V3. DeepSeek V4 may launch a 200B quantized model first, with flagship exceeding 1T parameters.

Kimi K2.6 Open-Source Hands-On: How Strong Is Its Orchestration of 300 Concurrent Agents?

Product Reviews

2026年5月27日·2 min

Kimi K2.6 Open-Source Hands-On: How Strong Is Its Orchestration of 300 Concurrent Agents?

Deep analysis of Moonshot AI's open-source Kimi K2.6 Agent orchestration: 300 sub-Agents executing 4000-step tasks, outperforming GPT-5.4 in coding benchmarks, LoRA fine-tuning on 2x RTX 4090s.

Kimi K2.6 In-Depth Review: A Complete Breakdown of Its Coding and Agent Capabilities

Product Reviews

2026年5月27日·3 min

Kimi K2.6 In-Depth Review: A Complete Breakdown of Its Coding and Agent Capabilities

In-depth review of Kimi K2.6's coding, Agent collaboration, and visual development capabilities. #1 open-source on SWE-Bench Pro, 300 parallel sub-agents, API priced at 1/3 of competitors.

Product Reviews

Running Qwen3.6-27B Locally on Mac: 4 …

2026年5月27日·3 min

Running Qwen3.6-27B Locally on Mac: 4 Solutions Benchmarked

Benchmarking 4 solutions for running Qwen3.6-27B locally on Mac: GGUF, MLX Diflash, and MTP-LX. MTP-LX 4bit leads at 43.6 tok/s with solid coding, writing, and reasoning quality.

Kimi K2.5 Fully Open-Sourced: Deep Dive into 1T Parameter MoE Architecture + Agent Cluster Capabilities

Tech Frontiers

2026年5月27日·2 min

Kimi K2.5 Fully Open-Sourced: Deep Dive into 1T Parameter MoE Architecture + Agent Cluster Capabilities

Deep dive into Moonshot AI's fully open-sourced Kimi K2.5: 1T parameter MoE architecture, Vision-to-Code capabilities, and 100-Agent parallel cluster system topping open-source benchmarks.

Tutorials

Universal AI Prompts for Mathematical …

2026年5月27日·3 min

Universal AI Prompts for Mathematical Modeling: A Zero-to-Hero Four-Stage Practical Guide

A detailed guide to the four-stage universal AI prompt system for mathematical modeling, covering problem analysis, innovative model construction, data processing, and model solving for competitions.