#supervised fine-tuning

23 related articles

2026年6月15日·3 min

Why SFT Can't Fix the Root Cause of JSON Errors: How GRPO Correctness Training Breaks Through Coding Agent Bottlenecks

Analysis of why SFT can't fix coding agent JSON errors and how GRPO's binary reward signals and synchronized weight updates train directly for correctness.

2026年6月15日·3 min

Fireworks Platform Adds Nemotron 3 Ultra Post-Training Support: End-to-End Fine-Tuning and Deployment

Fireworks AI adds NVIDIA Nemotron 3 Ultra post-training support with SFT, DPO, LoRA, and full fine-tuning, enabling seamless train-to-deploy workflows for open-weight LLM customization.

2026年6月15日·3 min

Java Developer's Guide to AI Application Development: From Spring AI to Intelligent Customer Service

A comprehensive guide for Java developers transitioning to AI application development, covering Spring AI, RAG, Function Calling, and a hands-on airline intelligent customer service project.

2026年6月15日·2 min

The 4-Stage Roadmap for AI Application Development: A Career Transition Guide from Beginner to Senior Engineer

A 4-stage roadmap for AI application development: from Python and RAG basics to Agent cluster architecture, covering the core skills needed for career growth.

2026年6月14日·2 min

How Low-Quality RL Environments Sabotage Model Training: A Diagnosis and Repair Guide

Diagnose and fix common RL training environment issues including reward hacking, flawed state spaces, and broken verifiers that silently degrade model performance.

2026年6月14日·3 min

Andrew Ng's New Course Explained: A Practical Guide to Using OpenAI's O1 Reasoning Model

Deep dive into Andrew Ng and OpenAI's Reasoning with O1 course covering test-time scaling, new prompting paradigms, multi-model orchestration, and practical applications for developers.

2026年6月13日·2 min

A Complete Guide to LLM Infrastructure: Core Challenges from GPU Clusters to Inference Optimization

A deep dive into core challenges and key technologies for LLM infrastructure, covering GPU cluster management, inference optimization, distributed training, cost control, and observability.

2026年6月6日·3 min

Claude Opus 4.8 Identifies Itself as DeepSeek: Data Contamination or Distillation? A Technical Analysis

Anthropic's Claude Opus 4.8 failed within 2 hours of launch, identifying itself as DeepSeek and Tongyi Qianwen in Chinese. Deep analysis of data contamination vs distillation hypotheses and multilingual alignment gaps.

2026年6月6日·2 min

LlamaFactory: A Comprehensive Guide to the Open-Source Framework for Unified Fine-Tuning of 100+ LLMs

Deep dive into LlamaFactory, an open-source unified fine-tuning framework supporting 100+ LLMs and VLMs with LoRA, QLoRA, RLHF methods, Web UI, 71K+ GitHub Stars, accepted at ACL 2024.

2026年6月4日·4 min

OpenAI Swarm Framework Explained: The Core Mechanisms of Function Call and Handoff

Deep dive into OpenAI Swarm multi-agent orchestration framework, explaining Function Call tool invocation and Handoff task transfer mechanisms with local deployment guide.

Complete Guide to LLM Training: Pre-training, SFT Fine-tuning, and Preference Alignment Explained

Deep Dives

2026年6月3日·3 min

Complete Guide to LLM Training: Pre-training, SFT Fine-tuning, and Preference Alignment Explained

Complete guide to the three core LLM training stages: pre-training, supervised fine-tuning (SFT), and preference alignment (DPO/PPO), covering LoRA, distillation, quantization, and pruning.

GPT 5.5 Dubbed 'Autistic Genius': Codex Downloads Surge 1397%, The Truth Behind the Developer Exodus

Industry Insights

2026年6月3日·3 min

GPT 5.5 Dubbed 'Autistic Genius': Codex Downloads Surge 1397%, The Truth Behind the Developer Exodus

OpenAI CEO Altman calls GPT 5.5 an 'Autistic Genius.' Codex downloads surge 1397% to 90M while Claude Code drops 38%. Deep analysis of the developer migration driven by cost, performance, and UX.

Manus Hands-On Review: How Does This AI Agent Perform on the DeepSeek Tech Stack?

Product Reviews

2026年6月3日·3 min

Manus Hands-On Review: How Does This AI Agent Perform on the DeepSeek Tech Stack?

Hands-on review of Manus AI Agent on the DeepSeek tech stack, analyzing task execution, Chinese reasoning capabilities, strengths, limitations, and the potential of domestic LLMs in Agent applications.

Essential Skills for LLM Engineers: A Complete Guide to Application Development and Fine-Tuning

Tutorials

2026年6月2日·1 min

Essential Skills for LLM Engineers: A Complete Guide to Application Development and Fine-Tuning

A systematic guide to LLM engineer core skills covering RAG, Agent app development and SFT, RLHF fine-tuning, with clear learning paths for different backgrounds.

Free Unlimited DeepSeek Full Version? Deep Dive into AI Aggregation Platforms & Risk Analysis

Product Reviews

2026年6月2日·2 min

Free Unlimited DeepSeek Full Version? Deep Dive into AI Aggregation Platforms & Risk Analysis

In-depth analysis of AI aggregation platforms claiming free unlimited DeepSeek R1 full version access, revealing data security risks and sustainability concerns, with reliable alternatives.

MementoGUI: A Multimodal Memory Management Framework for Solving Long-Horizon GUI Agent Amnesia

Research

2026年6月2日·3 min

MementoGUI: A Multimodal Memory Management Framework for Solving Long-Horizon GUI Agent Amnesia

MementoGUI is a plugin-style multimodal memory management framework that solves GUI agent forgetting in long-horizon tasks through dual time-scale memory and four memory control operators, boosting long-task completion without fine-tuning.

AI Agent Learning Roadmap: A Complete Guide from LLM Fundamentals to Enterprise-Level Project Implementation

Tutorials

2026年6月2日·1 min

AI Agent Learning Roadmap: A Complete Guide from LLM Fundamentals to Enterprise-Level Project Implementation

A systematic AI Agent learning roadmap covering Python setup, Prompt Engineering, RAG, LangChain, multi-Agent collaboration, with enterprise medical consultation system case study and phased learning plan.

The Salary Ceiling for Agent Engineers: Two Critical Dividing Lines

Expert Opinions

2026年6月2日·3 min

The Salary Ceiling for Agent Engineers: Two Critical Dividing Lines

Agent engineer salary gaps hinge on two dividing lines: real production deployment experience and depth of foundational theory including deep learning, fine-tuning, and reinforcement learning.

Claude Opus 4.8 Thinking Effort Calibration Explained: A Critical Optimization Direction for AI Reasoning Models

Tech Frontiers

2026年5月31日·2 min

Claude Opus 4.8 Thinking Effort Calibration Explained: A Critical Optimization Direction for AI Reasoning Models

Anthropic releases Claude Opus 4.8 with optimized thinking effort calibration. This article explains what it is, why it matters for AI reasoning models, and its impact on industry competition.

Product Reviews

Deep Comparison of o1, o1 pro, and o3-…

2026年5月30日·3 min

Deep Comparison of o1, o1 pro, and o3-mini-high Coding Capabilities: A Deep Research Analysis

Deep Research comparison of OpenAI o1, o1 pro, and o3-mini-high coding capabilities, covering code quality, optimization, error rates, and debugging with benchmarks and real-world cases.