23 related articles

Analysis of why SFT can't fix coding agent JSON errors and how GRPO's binary reward signals and synchronized weight updates train directly for correctness.

Fireworks AI adds NVIDIA Nemotron 3 Ultra post-training support with SFT, DPO, LoRA, and full fine-tuning, enabling seamless train-to-deploy workflows for open-weight LLM customization.

A comprehensive guide for Java developers transitioning to AI application development, covering Spring AI, RAG, Function Calling, and a hands-on airline intelligent customer service project.

A 4-stage roadmap for AI application development: from Python and RAG basics to Agent cluster architecture, covering the core skills needed for career growth.

Diagnose and fix common RL training environment issues including reward hacking, flawed state spaces, and broken verifiers that silently degrade model performance.

Deep dive into Andrew Ng and OpenAI's Reasoning with O1 course covering test-time scaling, new prompting paradigms, multi-model orchestration, and practical applications for developers.

A deep dive into core challenges and key technologies for LLM infrastructure, covering GPU cluster management, inference optimization, distributed training, cost control, and observability.

Anthropic's Claude Opus 4.8 failed within 2 hours of launch, identifying itself as DeepSeek and Tongyi Qianwen in Chinese. Deep analysis of data contamination vs distillation hypotheses and multilingual alignment gaps.

Deep dive into LlamaFactory, an open-source unified fine-tuning framework supporting 100+ LLMs and VLMs with LoRA, QLoRA, RLHF methods, Web UI, 71K+ GitHub Stars, accepted at ACL 2024.

Deep dive into OpenAI Swarm multi-agent orchestration framework, explaining Function Call tool invocation and Handoff task transfer mechanisms with local deployment guide.
Deep DivesComplete guide to the three core LLM training stages: pre-training, supervised fine-tuning (SFT), and preference alignment (DPO/PPO), covering LoRA, distillation, quantization, and pruning.
Industry InsightsOpenAI CEO Altman calls GPT 5.5 an 'Autistic Genius.' Codex downloads surge 1397% to 90M while Claude Code drops 38%. Deep analysis of the developer migration driven by cost, performance, and UX.
Product ReviewsHands-on review of Manus AI Agent on the DeepSeek tech stack, analyzing task execution, Chinese reasoning capabilities, strengths, limitations, and the potential of domestic LLMs in Agent applications.
TutorialsA systematic guide to LLM engineer core skills covering RAG, Agent app development and SFT, RLHF fine-tuning, with clear learning paths for different backgrounds.
Product ReviewsIn-depth analysis of AI aggregation platforms claiming free unlimited DeepSeek R1 full version access, revealing data security risks and sustainability concerns, with reliable alternatives.
ResearchMementoGUI is a plugin-style multimodal memory management framework that solves GUI agent forgetting in long-horizon tasks through dual time-scale memory and four memory control operators, boosting long-task completion without fine-tuning.
TutorialsA systematic AI Agent learning roadmap covering Python setup, Prompt Engineering, RAG, LangChain, multi-Agent collaboration, with enterprise medical consultation system case study and phased learning plan.
Expert OpinionsAgent engineer salary gaps hinge on two dividing lines: real production deployment experience and depth of foundational theory including deep learning, fine-tuning, and reinforcement learning.
Tech FrontiersAnthropic releases Claude Opus 4.8 with optimized thinking effort calibration. This article explains what it is, why it matters for AI reasoning models, and its impact on industry competition.
Deep Comparison of o1, o1 pro, and o3-…
Deep Research comparison of OpenAI o1, o1 pro, and o3-mini-high coding capabilities, covering code quality, optimization, error rates, and debugging with benchmarks and real-world cases.