#continuous batching

7 related articles

2026年6月6日·3 min

vLLM Deep Dive: How PagedAttention Enables High-Throughput LLM Inference

Deep dive into vLLM's core technologies for high-throughput LLM inference, including PagedAttention memory management, continuous batching, distributed deployment, and comparisons with TensorRT-LLM.

2026年6月4日·4 min

Legitimate Slacking for Programmers: The Art of Waiting from Code Compilation to AI Generation

From the classic XKCD compilation meme to AI coding era reinterpretations — exploring how waiting for compilation and AI generation is reshaping developer productivity.

2026年6月4日·4 min

The Art of Legitimate Slacking: From Code Compilation to AI Generation Waits for Programmers

From the classic XKCD compilation meme to AI coding era reinterpretations — exploring how waiting for compilation and AI code generation is reshaping developer productivity.

Agent Tuning: A Complete Guide to Training LLMs with Agent Capabilities

Tutorials

2026年6月3日·3 min

Agent Tuning: A Complete Guide to Training LLMs with Agent Capabilities

A deep dive into Agent Tuning principles and practices, covering why Agent training is needed, the evolution from Prompt to RAG to Agent, development workflows, and cost assessment for private deployment.

SGLang Enters Finance: How AI Inference Infrastructure Is Reshaping Wall Street

Industry Insights

2026年5月30日·2 min

SGLang Enters Finance: How AI Inference Infrastructure Is Reshaping Wall Street

SGLang co-hosts a finance AI inference event with Crusoe AI and Cloudflare, exploring LLM inference deployment in trading, risk management, and compliance — signaling Wall Street's shift to production-grade AI infrastructure.

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

Industry Insights

2026年5月30日·2 min

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

AMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.

NVIDIA Blackwell Sets New STAC-AI Records for Financial LLM Inference

Industry Insights

2026年5月27日·2 min

NVIDIA Blackwell Sets New STAC-AI Records for Financial LLM Inference

NVIDIA Blackwell GPU sets new LLM inference records in STAC-AI financial benchmark. Explore Blackwell architecture advantages, TensorRT-LLM co-optimization, and LLM applications in trading and risk management.