2 related articles
Industry InsightsAMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.
DeepSeek V4 Flash MTP Speculative Deco…
Real-world testing of DeepSeek V4 Flash with MTP speculative decoding: ~20% speedup for code generation, minimal gains for text. Covers memory overhead, accuracy differences, Q4 vs Q3 quantization, and full deployment tutorial.