107 related articles
TutorialsLearn how to deploy a PD-disaggregated SGLang inference cluster on AMD GPUs using a single config file, boosting LLM throughput and latency performance.
Tech FrontiersSGLang v0.5.12.post1 stability patch details: 12 critical fixes covering DeepSeek V4 garbled text and crashes, NIXL PD disaggregated inference logic, Blackwell B300 adaptation, and cold start optimization.
Industry InsightsAMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.
Tech FrontiersSGLang team hosts an Agent Loops Office Hour exploring inference optimization for agentic loops, covering KV Cache reuse, low-latency multi-turn dialogue, and tool calling techniques.
AI Agent Practical Development: A Comp…
A deep dive into AI Agent core principles and practical development paths, covering perception-decision-execution capabilities, MCP protocol tool integration, and analysis of Manus and AutoGLM.
Codex + Claude Code + Cursor: A Practi…
A deep breakdown of Codex, Claude Code, and Cursor — their positioning, collaboration methods, and a complete practical workflow with pricing and role-based pairing recommendations.
TutorialsA beginner's guide to learning AI large language models — covering learning paths, hardware requirements, Python essentials, and cloud services for learners at every level.
Tech FrontiersAnthropic closes a $65B Series H round at a $965B valuation, co-led by Sequoia and others. Funds target frontier AI research and Claude compute scaling, setting a new tech private funding record.
Tech FrontiersExplore NVIDIA Muse Spark's features as an AI creative tool, discover community users' creative applications in work and entertainment, and analyze AI creative tool ecosystem trends.
Industry InsightsThe EU AI Fund aims to provide GPU compute for startups, but entrepreneurs question resource allocation citing cronyism. Analysis of EU AI subsidy challenges vs. US market-driven models.
Industry InsightsMeta partners with AWS to add tens of millions of Graviton cores for AI inference, diversifying its infrastructure to support Meta AI and Agentic experiences for billions of users.
Context Mode: How One MCP Plugin Cured…
Context Mode solves AI coding assistants' context amnesia via sandbox isolation, session continuity tracking, and code-thinking philosophy—compressing context consumption by 99% and earning 9,700 Stars in two months.
Tech FrontiersGoogle introduces Gemini AI assistant in hiring to assess AI proficiency, OpenAI launches GPT-5.5 Cyber for critical infrastructure defense, Anthropic nears trillion-dollar valuation, Mozilla fixes 271 Firefox bugs with AI in two months.
Product ReviewsDeep dive into GPT Image 1.5's core upgrades: multi-turn editing stability, 4x speed boost, creative editing capabilities, and API access for commercial applications.
Industry InsightsDeep dive into how NVIDIA Dynamo Snapshot reduces LLM inference cold start time from minutes to seconds via GPU state snapshot and recovery, covering Kubernetes integration and elastic inference.
Tech FrontiersWeekly AI roundup: Kimi K2.6 tops open-source rankings, Anthropic launches Opus 4.7 and Claude Design, Alibaba rolls out Qwen 3.6 series, Google releases emotion-controllable TTS model.
Tech FrontiersDeepSeek releases OCR2 replacing CLIP with an LLM as visual encoder; Moonshot AI launches Kimi K2.5 with 100+ sub-agent cluster mode; Microsoft deploys 3nm Maia 200 chip; Alibaba releases Qwen3 Max Thinking.
Silicon Valley Engineer Quits Big Tech…
Ex-NVIDIA GTC award winner Sparky: an AI researcher quit big tech and used 10+ years of theater experience to design an AI personality system with dynamic interests, long-term memory, and proactive social skills.
Decoding LLM Naming Conventions: Param…
Decode LLM naming conventions, understand 32B parameters & AWQ/GGUF quantization formats, with 4-bit VRAM estimation formulas, MOE model pitfalls, and model selection by GPU tier.
AI Coding Appliance vs Cloud LLMs: Can…
A deep cost comparison between AI coding appliances and cloud LLM APIs. A 20-person team spending ¥480K/year on tokens can deploy 4 local OnePanel units at ¥99K each, breaking even in 2.5 months.