2 related articles

Analysis of why SFT can't fix coding agent JSON errors and how GRPO's binary reward signals and synchronized weight updates train directly for correctness.
Deep DivesComplete guide to the three core LLM training stages: pre-training, supervised fine-tuning (SFT), and preference alignment (DPO/PPO), covering LoRA, distillation, quantization, and pruning.