22 related articles
TutorialsComplete tutorial on Alibaba Cloud Bailian platform covering API Key setup, Qwen model calls, streaming output, multi-turn conversation principles, and prompt engineering with four roles.
Deep DivesComplete guide to the three core LLM training stages: pre-training, supervised fine-tuning (SFT), and preference alignment (DPO/PPO), covering LoRA, distillation, quantization, and pruning.
TutorialsA hands-on guide to using Qwen3 for free via OpenRouter API and Ollama local deployment, paired with Cline coding agent for full-stack development tasks.
Product ReviewsWhichLLM is an open-source tool that auto-detects your hardware and recommends the best local LLM using real benchmark data. Simulate GPUs, filter fake benchmarks, and start chatting in one command.
TutorialsGuide to enabling MTP multi-Token prediction acceleration in llama.cpp, covering CUDA setup, desktop configuration, model selection, and benchmarks showing ~60 Token/s with Qwen3 27B.
TutorialsHands-on SwiftUI tutorial using Qwen3-Max and ChatGPT to generate an iOS habit tracker app. Covers Xcode setup, AI code generation pitfalls (Combine import issue), and debugging tips for beginners.
TutorialsA practical guide to frontend AI full-stack development covering PNPM MonoRepo architecture, TurboRepo build optimization, and LangChain multimodal applications with Ollama local model deployment.
TutorialsStep-by-step guide to building a local RAG knowledge base using RAGFlow, Ollama, and LM Studio with Docker, covering Embedding model deployment and network troubleshooting for private AI Q&A.
Product ReviewsDeep analysis of Qwen Code 2.0 updates covering Plan Mode approval mechanism, Visual Intelligence auto-switching, Zed editor dual authentication, and Windows fixes for this CLI coding assistant.
TutorialsComplete guide to building a local AI knowledge base with Qwen3.5, RAGFlow, and Ollama, covering Docker deployment, Embedding model configuration, knowledge base creation, and RAG system setup.
TutorialsUsing oMLX with MTP and Qwen3.6 35B on Apple Silicon Mac to achieve 86.7 tokens/s local coding speed, building a full-stack app in under 5 minutes.
Risks of AI Account Rotation Tools Exp…
Deep dive into how AI quota-cracking tools work, exposing the legal, compliance, and data security risks behind account rotation gray markets, with legitimate alternatives like API pay-per-use and subscription upgrades.
Deep Dive into Qwen3.7 Max: One-Tenth …
Alibaba's Qwen3.7 Max targets AI agents with coding tasks at just $1.30 (one-tenth of GPT-5), supporting 35 hours of continuous execution. Deep analysis of its cost advantages, front-end capabilities, and three key limitations.
LangGraph 0.5.3 + MCP Agent Developmen…
LangGraph 0.5.3 introduces MCP server security authentication and agent deployment solutions. Combined with Qwen3 models, it provides a complete production-grade AI agent development stack.
Gemini 2.5 Pro 0605 Hands-On Compariso…
Hands-on testing of Gemini 2.5 Pro 0605 across coding, reasoning, creative writing, and app development, compared head-to-head with OpenAI o3 and Claude Opus 4.
Why Qwen3 Is the Best Open-Source Mode…
Analysis of Qwen3's advantages for MCP agent development, comparing DeepSeek R1's lack of Function Calling, covering MoE architecture and thinking mode switching.
Tech FrontiersDeepSeek releases OCR2 replacing CLIP with an LLM as visual encoder; Moonshot AI launches Kimi K2.5 with 100+ sub-agent cluster mode; Microsoft deploys 3nm Maia 200 chip; Alibaba releases Qwen3 Max Thinking.
Tech FrontiersAnthropic adds custom sub-agents to Claude Code, Cursor launches code review Agent BugBot, Qwen releases 92-language translation model, and Google unveils three experimental AI products.
Running Qwen3.6-27B Locally on Mac: 4 …
Benchmarking 4 solutions for running Qwen3.6-27B locally on Mac: GGUF, MLX Diflash, and MTP-LX. MTP-LX 4bit leads at 43.6 tok/s with solid coding, writing, and reasoning quality.
Decoding LLM Naming Conventions: Param…
Decode LLM naming conventions, understand 32B parameters & AWQ/GGUF quantization formats, with 4-bit VRAM estimation formulas, MOE model pitfalls, and model selection by GPU tier.