【生成式AI時代下的機器學習(2025)】第四講：Transformer 的時代要結束了嗎？介紹 Transformer 的競爭者們

本集探討了 Transformer 架構的潛在競爭對手，並深入分析了類神經網路架構設計背後的理由。講者首先點出，在作業中將訓練用於產生圖片的 Transformer，並非僅限於大型語言模型。接著，課程聚焦於理解每個網路架構存在的理由，例如 CNN 如何透過減少不必要的參數來避免 overfitting，以及 residual connection 如何解決深層網路訓練的優化問題。進一步，探討了 Self-Attention 如何取代 RNN 和 LSTM，解決輸入向量序列並輸出另一個向量序列的問題。相較於 RNN，Self-Attention 在訓練時更易於平行化，從而更有效地利用 GPU 效能。然而，Self-Attention 在處理長序列時面臨記憶體需求增加的挑戰，因此重新審視了 RNN 的平行化潛力，並介紹了 Linear Attention 作為 RNN 的一種變形，它通過移除 Reflection 機制實現了平行化。最後，討論了 Retention Network 和 Gated Retention 等進階版本，以及 Mamba 和 Delta Net 等架構，這些都反映了業界對更高效、更靈活的序列處理方法的不斷探索。

Outlines

Part 1: Transformer架构与RNN回顾

Part 2: Transformer优势与RNN的局限

Part 3: Linear Attention与记忆机制

Part 4: Retention Network与Mamba

Part 5: 应用与展望

Sign in to continue reading, translating and more.

Continue

Hung-yi Lee

Part 1: Transformer架构与RNN回顾

Transformer 的多樣應用與潛在競爭者

RNN 與 Self-Attention 的比較：問題本質與解決方案

RNN 流：架構、運作方式與記憶處理

Part 2: Transformer优势与RNN的局限

RNN 與 Self-Attention 在 Inference 階段的運作比較

Transformer 的優勢：平行化訓練與 GPU 友好

Transformer 與 RNN 的 GPU 效能比較

Part 3: Linear Attention与记忆机制

RNN 的平行化可能性與 Linear Attention 的出現

Linear Attention 的直觀解釋與記憶限制

Softmax 的重要性與記憶改變機制

Part 4: Retention Network与Mamba

Retention Network 與 Gated Retention：記憶遺忘與控制

Mamba 的性能表現與 Delta Net 的梯度下降觀點

Part 5: 应用与展望

Linear Attention 的應用與未來展望

【生成式AI時代下的機器學習(2025)】第四講：Transformer 的時代要結束了嗎？介紹 Transformer 的競爭者們

Hung-yi Lee

Part 1: Transformer架构与RNN回顾

00:01Transformer 的多樣應用與潛在競爭者

Transformer 的多樣應用與潛在競爭者

07:56RNN 與 Self-Attention 的比較：問題本質與解決方案

RNN 與 Self-Attention 的比較：問題本質與解決方案

09:24RNN 流：架構、運作方式與記憶處理

RNN 流：架構、運作方式與記憶處理

Part 2: Transformer优势与RNN的局限

15:54RNN 與 Self-Attention 在 Inference 階段的運作比較

RNN 與 Self-Attention 在 Inference 階段的運作比較

23:32Transformer 的優勢：平行化訓練與 GPU 友好

Transformer 的優勢：平行化訓練與 GPU 友好

30:03Transformer 與 RNN 的 GPU 效能比較

Transformer 與 RNN 的 GPU 效能比較

Part 3: Linear Attention与记忆机制

34:44RNN 的平行化可能性與 Linear Attention 的出現

RNN 的平行化可能性與 Linear Attention 的出現

45:12Linear Attention 的直觀解釋與記憶限制

Linear Attention 的直觀解釋與記憶限制

55:34Softmax 的重要性與記憶改變機制

Softmax 的重要性與記憶改變機制

Part 4: Retention Network与Mamba

1:01:04Retention Network 與 Gated Retention：記憶遺忘與控制

Retention Network 與 Gated Retention：記憶遺忘與控制

1:08:43Mamba 的性能表現與 Delta Net 的梯度下降觀點

Mamba 的性能表現與 Delta Net 的梯度下降觀點

Part 5: 应用与展望

1:17:05Linear Attention 的應用與未來展望

Linear Attention 的應用與未來展望