55 related articles
Deep DivesA deep dive into the complete RAG pipeline — covering vector embeddings, document chunking, retrieval and reranking, plus three production optimization techniques for building accurate enterprise AI knowledge base applications.
Product ReviewsWhichLLM is an open-source tool that auto-detects your hardware and recommends the best local LLM using real benchmark data. Simulate GPUs, filter fake benchmarks, and start chatting in one command.
TutorialsStep-by-step guide to building an automated short video generation workflow on Coze, covering script writing, voiceover, AI images, video synthesis, and CapCut packaging.
TutorialsGuide to enabling MTP multi-Token prediction acceleration in llama.cpp, covering CUDA setup, desktop configuration, model selection, and benchmarks showing ~60 Token/s with Qwen3 27B.
TutorialsStep-by-step guide to building a local RAG knowledge base using RAGFlow, Ollama, and LM Studio with Docker, covering Embedding model deployment and network troubleshooting for private AI Q&A.
TutorialsStep-by-step tutorial: Build a low-cost AI programming assistant using DeepSeek-V3 API with VSCode's Continue plugin. Covers setup, API Key configuration, code completion demo, and Ollama local deployment.
Product ReviewsDeep dive into Cursor 2.0's five major updates: custom Composer model, Git Worktrees multi-agent parallel development, Agent View mode, built-in browser, and more—with hands-on evaluation.
Deep DivesDeep dive into Transformer architecture covering self-attention QKV mechanics, Encoder-Decoder structure, Flash Attention memory optimization, RoPE positional encoding, and GQA inference acceleration.
Product ReviewsDetailed review of Hertzman local inference engine covering one-click deployment, smart hardware recommendations, OpenAI-compatible API, and performance comparison with LM Studio.
TutorialsLearn how to configure a local DeepSeek model in PyCharm via Ollama for free, privacy-safe AI-assisted programming. Includes installation steps, plugin setup, usage tips, and hardware recommendations.
Product ReviewsFull breakdown of Claude Code 2.1: Opus 4.6 model upgrade, Hooks deterministic automation, Skills multi-agent collaboration, MCP tool chain integration, plus IDE shortcuts and practical commands.
TutorialsUsing oMLX with MTP and Qwen3.6 35B on Apple Silicon Mac to achieve 86.7 tokens/s local coding speed, building a full-stack app in under 5 minutes.
Qoder's Context Engineering in Practic…
Deep analysis of Qoder's (Tongyi Lingma international edition) context engineering architecture, including its four-layer retrieval engine, memory engine, context caching, and core product design.
Claude Code vs Codex Deep Dive: A Prac…
A comprehensive comparison of Claude Code and OpenAI Codex covering architecture, use cases, and benchmarks to help you choose the right AI coding tool.
Tech FrontiersWindsurf integrates Claude Opus 4.7 fast mode with 2.5x speed boost while retaining full intelligence. Analysis of its impact on developer productivity and AI coding tool competition.
Tech FrontiersDeep dive into StepFun AI's Step 3.7 Flash, a 198B sparse MoE vision-language model with 256K context and 3-level reasoning, excelling in multimodal understanding, AI coding, and Agent tool orchestration.
Tech FrontiersSGLang team hosts an Agent Loops Office Hour exploring inference optimization for agentic loops, covering KV Cache reuse, low-latency multi-turn dialogue, and tool calling techniques.
Claude Code with MiniMax M2: Testing a…
Real-world testing of MiniMax M2 as Claude Code's backend model across three projects: framework migration, iOS development, and full-stack MVP — at just 8% of Claude's price.
DeepSeek V4 Flash MTP Speculative Deco…
Real-world testing of DeepSeek V4 Flash with MTP speculative decoding: ~20% speedup for code generation, minimal gains for text. Covers memory overhead, accuracy differences, Q4 vs Q3 quantization, and full deployment tutorial.
Building a SaaS Website with AI and Ze…
Learn how to build a SaaS website with AI image generation, multimodal chat, and webpage replication using only Bolt and Cursor — no code required. Covers prompt design, architecture, and iteration techniques.