6 related articles

Deep dive into vLLM's core technologies for high-throughput LLM inference, including PagedAttention memory management, continuous batching, distributed deployment, and comparisons with TensorRT-LLM.
Alibaba's $52B AI Investment: A Full-S…
Alibaba invests $52B in AI cloud infrastructure over 3 years. Bailian Platform hits 8B yuan ARR with 11 quarters of triple-digit AI revenue growth. Deep analysis of Alibaba Cloud's full-stack Agent upgrade strategy.
Tech FrontiersSGLang v0.5.12.post1 stability patch details: 12 critical fixes covering DeepSeek V4 garbled text and crashes, NIXL PD disaggregated inference logic, Blackwell B300 adaptation, and cold start optimization.
Industry InsightsAMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.
Tech FrontiersCloudflare contributes decode KV cache offload and Mooncake recovery fixes to SGLang, resolving garbled output under high concurrency for Kimi K2.6 and enabling automatic fault recovery in distributed inference.
Tech FrontiersSGLang team hosts an Agent Loops Office Hour exploring inference optimization for agentic loops, covering KV Cache reuse, low-latency multi-turn dialogue, and tool calling techniques.