30 Apr 2026
1h 33m

163: 详解DeepSeekV4:Infra巨鲸、百万上下文走进现实、极致效率优化

Podcast cover

晚点聊 LateTalk

DeepSeek V4 signifies a pivotal shift in large-scale model architecture, moving away from the MLA framework toward a hybrid attention mechanism that integrates sliding window and long-range attention. This release demonstrates the industry's transition toward engineering-heavy innovation, characterized by the simultaneous implementation of four complex features: a novel attention mechanism, the Muon optimizer, Multi-Head Connection (MHC), and FP4 training. By achieving an extremely low activation ratio and utilizing token-wise compression, DeepSeek effectively balances massive parameter capacity with computational efficiency. The reliance on custom kernels like Tailang and training-time pseudo-quantization highlights a broader trend where infrastructure mastery and the ability to manage coupled system complexities have become the primary differentiators for frontier AI labs. These advancements underscore a shift from simple scaling laws to highly optimized, cost-effective engineering paradigms that define the current competitive landscape of artificial intelligence.

Outlines

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval