# GGUF

Complete Guide to LLM Training: Pre-training, SFT Fine-tuning, and Preference Alignment Explained

Complete guide to the three core LLM training stages: pre-training, supervised fine-tuning (SFT), and preference alignment (DPO/PPO), covering LoRA, distillation, quantization, and pruning.

OpenClaw Local Deployment Tutorial: Connect to WeChat, Feishu & DingTalk in 10 Minutes

OpenClaw Local Deployment Tutorial: Connect to WeChat, Feishu & DingTalk in 10 Minutes

Complete guide to deploying OpenClaw locally, covering Windows setup, cloud deployment, WeChat/Feishu/DingTalk integration, and custom Skills—beginners can deploy in 10 minutes.

DeepSeek-V3.2 Released: Coding and Math Capabilities Join the Global Top Tier

Tech Frontiers

2026年6月3日·2 min

DeepSeek-V3.2 Released: Coding and Math Capabilities Join the Global Top Tier

DeepSeek-V3.2 released with coding, math, and Agent capabilities matching Gemini 3.0 Pro, setting new open-source SOTA. Detailed analysis of performance gains, use cases, and deployment tips.

Google Gemma 4 Hands-On Review: Offline on Smartphones + Ollama Deployment Tutorial

Google Gemma 4 Hands-On Review: Offline on Smartphones + Ollama Deployment Tutorial

Hands-on testing of Google Gemma 4 open-source models running offline on three phones, with Dense vs MOE architecture explained and a complete Ollama + Claude Code deployment tutorial.

WhichLLM: One Command to Find the Best Local LLM for Your Hardware

WhichLLM: One Command to Find the Best Local LLM for Your Hardware

WhichLLM is an open-source tool that auto-detects your hardware and recommends the best local LLM using real benchmark data. Simulate GPUs, filter fake benchmarks, and start chatting in one command.

One Person, Three Machines: Local Agent Deployment and Multi-Machine Collaborative Operations in Practice

One Person, Three Machines: Local Agent Deployment and Multi-Machine Collaborative Operations in Practice

Deploy Cloud Code and Hermes AI Agents to efficiently manage three physical hosts solo. Covers Ventoy single-file deployment, BTRFS+RAW Image setup, Agent task division, and risk control strategies.

llama.cpp MTP Acceleration Deployment Guide: Configuration Steps & Real-World Benchmarks

llama.cpp MTP Acceleration Deployment Guide: Configuration Steps & Real-World Benchmarks

Guide to enabling MTP multi-Token prediction acceleration in llama.cpp, covering CUDA setup, desktop configuration, model selection, and benchmarks showing ~60 Token/s with Qwen3 27B.

Frontend AI Full-Stack Development in Practice: Building Multimodal Applications with PNPM MonoRepo Architecture

Frontend AI Full-Stack Development in Practice: Building Multimodal Applications with PNPM MonoRepo Architecture

A practical guide to frontend AI full-stack development covering PNPM MonoRepo architecture, TurboRepo build optimization, and LangChain multimodal applications with Ollama local model deployment.

AnythingLLM Installation & Configuration Guide: Building a Local Knowledge Base with API Integration

AnythingLLM Installation & Configuration Guide: Building a Local Knowledge Base with API Integration

Complete guide to AnythingLLM local knowledge base setup: installation tips, Ollama model configuration, document vectorization, recall optimization, and API integration.

Hertzman: A Free, No-Install Local LLM Deployment Tool Review

Hertzman: A Free, No-Install Local LLM Deployment Tool Review

Detailed review of Hertzman local inference engine covering one-click deployment, smart hardware recommendations, OpenAI-compatible API, and performance comparison with LM Studio.

Complete Guide to Configuring Local DeepSeek Model in PyCharm for AI-Assisted Programming

2026年6月2日·2 min

Complete Guide to Configuring Local DeepSeek Model in PyCharm for AI-Assisted Programming

Learn how to configure a local DeepSeek model in PyCharm via Ollama for free, privacy-safe AI-assisted programming. Includes installation steps, plugin setup, usage tips, and hardware recommendations.

OpenHuman Deep Dive: A Context-First Open-Source Personal AI Agent

2026年6月2日·4 min

OpenHuman Deep Dive: A Context-First Open-Source Personal AI Agent

Deep dive into OpenHuman open-source AI Agent: context-first architecture, Rust+React hybrid, Memory Tree system, Token Juice compression, and multi-model routing.

pnpm Monorepo Full-Stack AI Engineerin…

2026年6月1日·2 min

pnpm Monorepo Full-Stack AI Engineering in Practice: Building a Multimodal Conversation System

Learn how to build a full-stack multimodal AI conversation system using pnpm Monorepo architecture, covering local model integration, image understanding, and streaming chat.

Practical Guide to Building Multi-Agen…

2026年5月29日·3 min

Practical Guide to Building Multi-Agent Collaborative Applications with CrewAI + FastAPI

Learn how to build a multi-Agent collaborative system with CrewAI and FastAPI. Covers Agent, Task, Crew concepts, GPT/Tongyi Qianwen/Ollama integration, with complete code examples and model comparisons.

PyCharm AI Assistant Deep Dive: Local …

2026年5月28日·2 min

PyCharm AI Assistant Deep Dive: Local Completion, Edit Mode & Practical Tips

Explore PyCharm AI Assistant's new features: free local AI completion, cloud-powered generation, Chat & Edit modes, and context management tips for Python developers.

Claude Agent SDK + LiteLLM + Local LLMs: Building a Zero-Cost AI Agent Platform

2026年5月28日·3 min

Claude Agent SDK + LiteLLM + Local LLMs: Building a Zero-Cost AI Agent Platform

Learn how to redirect Claude Agent SDK API requests to local LLMs via LiteLLM Proxy, achieving zero-cost inference while retaining full agent framework capabilities.

Running Qwen3.6-27B Locally on Mac: 4 …

2026年5月27日·3 min

Running Qwen3.6-27B Locally on Mac: 4 Solutions Benchmarked

Benchmarking 4 solutions for running Qwen3.6-27B locally on Mac: GGUF, MLX Diflash, and MTP-LX. MTP-LX 4bit leads at 43.6 tok/s with solid coding, writing, and reasoning quality.

Decoding LLM Naming Conventions: Param…

2026年5月27日·3 min

Decoding LLM Naming Conventions: Parameter Counts, Quantization Formats & VRAM Requirements Quick Reference

Decode LLM naming conventions, understand 32B parameters & AWQ/GGUF quantization formats, with 4-bit VRAM estimation formulas, MOE model pitfalls, and model selection by GPU tier.