#skip connections

2 related articles

2026年6月2日·3 min

DeepSeek V4 Deep Technical Breakdown: Million-Token Context and Extreme Cost Efficiency

Deep analysis of DeepSeek V4's core architecture: Hybrid Compressed Attention, Manifold-Constrained Hyperconnection, and MUON optimizer—how they cut inference costs by 10x and enable million-token context processing.

Tutorials

Efficient PyTorch Learning: A Source C…

2026年5月27日·3 min

Efficient PyTorch Learning: A Source Code-Driven Methodology

A proven PyTorch learning method: spend 2-3 days on basics, then advance rapidly by reading U-Net and ViT source code line by line. Master PyTorch through source code-driven learning.