# LLM

CLAUDE.md Configuration Guide: Write Your Project Spec Using a Six-Section Structure

A detailed guide to configuring CLAUDE.md with a six-section structure covering project overview, features, tech stack, directory structure, code conventions, and constraints to boost AI coding efficiency.

Claude Code Hidden Configurations Full…

Claude Code Hidden Configurations Fully Explained: From Chat Assistant to Semi-Automated Programming Workflow

Explore Claude Code's source code to unlock hidden configurations like Hooks, Agents, Permissions, and Memories. Transform your AI assistant into a customizable semi-automated development workflow.

Windsurf Integrates Claude Opus 4.7 Fast Mode with 2.5x Speed Boost

2026年5月30日·1 min

Windsurf Integrates Claude Opus 4.7 Fast Mode with 2.5x Speed Boost

Windsurf integrates Claude Opus 4.7 fast mode with 2.5x speed boost while retaining full intelligence. Analysis of its impact on developer productivity and AI coding tool competition.

Research

Agent Loops in Practice: Transforming Token Output into Productivity from CUDA Kernels to Automated Research

Deep dive into how the Humanize framework transforms LLM tokens into engineering productivity via Agent Loops. Covers KDA winning CUDA kernel contests, virtual hardware optimization, and 50% research cost reduction.

Tutorial: Deploying a PD-Disaggregated SGLang Multi-Node Inference Cluster on AMD GPUs

Tutorial: Deploying a PD-Disaggregated SGLang Multi-Node Inference Cluster on AMD GPUs

Learn how to deploy a PD-disaggregated SGLang inference cluster on AMD GPUs using a single config file, boosting LLM throughput and latency performance.

SGLang v0.5.12.post1 Released: DeepSeek V4 Stability Fixes and Blackwell Adaptation

SGLang v0.5.12.post1 Released: DeepSeek V4 Stability Fixes and Blackwell Adaptation

SGLang v0.5.12.post1 stability patch details: 12 critical fixes covering DeepSeek V4 garbled text and crashes, NIXL PD disaggregated inference logic, Blackwell B300 adaptation, and cold start optimization.

Step 3.7 Flash: Deep Dive into the 198B Sparse MoE Multimodal Model

Step 3.7 Flash: Deep Dive into the 198B Sparse MoE Multimodal Model

Deep dive into StepFun AI's Step 3.7 Flash, a 198B sparse MoE vision-language model with 256K context and 3-level reasoning, excelling in multimodal understanding, AI coding, and Agent tool orchestration.

LFM2.5-8B-A1B: A MoE Model with 1.5B Active Parameters Delivering 4x Its Weight Class Performance

LFM2.5-8B-A1B: A MoE Model with 1.5B Active Parameters Delivering 4x Its Weight Class Performance

Liquid AI releases LFM2.5-8B-A1B, a MoE model with 8B total params but only 1.5B active, matching 6B-class models in tool calling. Supports 128K context, local deployment, multilingual, with SGLang Day-0 support.

SGLang Enters Finance: How AI Inference Infrastructure Is Reshaping Wall Street

Industry Insights

SGLang Enters Finance: How AI Inference Infrastructure Is Reshaping Wall Street

SGLang co-hosts a finance AI inference event with Crusoe AI and Cloudflare, exploring LLM inference deployment in trading, risk management, and compliance — signaling Wall Street's shift to production-grade AI infrastructure.

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

Industry Insights

AMD MI355X Beats B200: Full-Stack Optimization Breakdown for 5% Lower TCO on DeepSeek-R1 Inference

AMD Instinct MI355X achieves 5% lower TCO than NVIDIA B200 on DeepSeek-R1 disaggregated inference via SGLang+MoRI full-stack optimization with 1.25x per-GPU throughput.

Cloudflare Contributes Critical KV Cache and Mooncake Fixes to SGLang

2026年5月30日·1 min

Cloudflare Contributes Critical KV Cache and Mooncake Fixes to SGLang

Cloudflare contributes decode KV cache offload and Mooncake recovery fixes to SGLang, resolving garbled output under high concurrency for Kimi K2.6 and enabling automatic fault recovery in distributed inference.

SGLang Hosts Agent Loops Office Hour, Focusing on Agentic Loop Architecture Optimization

2026年5月30日·1 min

SGLang Hosts Agent Loops Office Hour, Focusing on Agentic Loop Architecture Optimization

SGLang team hosts an Agent Loops Office Hour exploring inference optimization for agentic loops, covering KV Cache reuse, low-latency multi-turn dialogue, and tool calling techniques.

Product Reviews

Llama 3.3 70B In-Depth Review: Testing…

Llama 3.3 70B In-Depth Review: Testing the Strongest Open-Source LLM with 13 Questions

Meta releases Llama 3.3 70B open-source model with just 70B parameters rivaling 405B performance. Tested on 13 logic, math, and coding questions, it passed 12 — reshaping the open-source model landscape.

BMad-Method: Building an AI Agile Deve…

BMad-Method: Building an AI Agile Development Team with a Multi-Agent Framework

Deep dive into BMad-Method, an open-source multi-agent framework simulating a full agile team—from business analysis to QA—supporting Claude Code, Cursor, and more.

Claude Code Source Code Study Guide: E…

Claude Code Source Code Study Guide: Efficiently Mastering Core AI Agent Development Architecture

Learn AI Agent development from Claude Code's 510K lines of source code, covering Agent Loop, context compression, multi-Agent orchestration, and two efficient study methods.

Spring AI Agent Utils: A Java Agent To…

Spring AI Agent Utils: A Java Agent Toolkit Reverse-Engineered from Claude Code's Core Features

Deep dive into Spring AI Agent Utils toolkit covering Skill modules, Ask a User Question, To Do Write, Auto Memory, and multi-Agent orchestration — empowering Java developers to build powerful AI Agents.

Product Reviews

ABCoder in Practice: A Demonstration o…

ABCoder in Practice: A Demonstration of Solving AI Code Hallucination

A practical comparison using Hertz framework SSE services shows how ABCoder uses MCP protocol to let AI models consult real source code, solving LLM code hallucination problems.

Coze Workflow for Auto-Generating Emot…

Coze Workflow for Auto-Generating Emotional Short Videos: A Zero-Code Tutorial

A detailed breakdown of automating emotional short video production with Coze workflows — from script generation and TTS to CapCut draft packaging, all zero-code.

OpenAI Launches Rosalind Biodefense Program: How AI Is Reshaping Public Health Security