2 related articles
Deep DivesDeep analysis of DeepSeek V4's core architecture: Hybrid Compressed Attention, Manifold-Constrained Hyperconnection, and MUON optimizer—how they cut inference costs by 10x and enable million-token context processing.
Efficient PyTorch Learning: A Source C…
A proven PyTorch learning method: spend 2-3 days on basics, then advance rapidly by reading U-Net and ViT source code line by line. Master PyTorch through source code-driven learning.