#residual connections

5 related articles

Hyper-Connections: The First Major Imp…

2026年6月6日·1 min

Hyper-Connections: The First Major Improvement to Residual Connections in a Decade

Deep dive into ByteDance's Hyper-Connections: expanding residual connections from one to multiple learnable pathways, significantly improving training under the same compute budget.

Demystifying Transformer: A Word-Continuation Function, Deconstructed

Deep Dives

2026年6月3日·1 min

Demystifying Transformer: A Word-Continuation Function, Deconstructed

Understand Transformer through the lens of word continuation. Breaking down language generation into Embedding, Transformer Block, and Probability output modules for intuitive understanding.

Claude Code for Academic Research: 3 Skills to Build a Paper Workflow

Tutorials

2026年6月2日·3 min

Claude Code for Academic Research: 3 Skills to Build a Paper Workflow

How to build a structured paper workflow with Claude Code: three core Skills for material classification, literature evidence matching, and reviewer simulation, plus six reusable AI-assisted research principles.

DeepSeek V4 Deep Technical Breakdown: Million-Token Context and Extreme Cost Efficiency

Deep Dives

2026年6月2日·3 min

DeepSeek V4 Deep Technical Breakdown: Million-Token Context and Extreme Cost Efficiency

Deep analysis of DeepSeek V4's core architecture: Hybrid Compressed Attention, Manifold-Constrained Hyperconnection, and MUON optimizer—how they cut inference costs by 10x and enable million-token context processing.

Core Principles of the Transformer Architecture: A Deep Dive into Self-Attention Mechanisms and Engineering Optimizations

Deep Dives

2026年6月2日·4 min

Core Principles of the Transformer Architecture: A Deep Dive into Self-Attention Mechanisms and Engineering Optimizations

Deep dive into Transformer architecture covering self-attention QKV mechanics, Encoder-Decoder structure, Flash Attention memory optimization, RoPE positional encoding, and GQA inference acceleration.