2 related articles
Industry InsightsAMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.
Tech FrontiersCloudflare contributes decode KV cache offload and Mooncake recovery fixes to SGLang, resolving garbled output under high concurrency for Kimi K2.6 and enabling automatic fault recovery in distributed inference.