YouTube17 Mar 2025
1h 45m

Build an LLM from Scratch 4: Implementing a GPT model from Scratch To Generate Text

Podcast cover

Sebastian Raschka

In this coding along series, Sebastian Raschka guides viewers through implementing a GPT model from scratch, building upon previous chapters covering data preparation, embedding, and attention mechanisms. The episode focuses on constructing the LLM architecture, using a dummy class as a placeholder to illustrate the model's components: embedding layers, transformer blocks (containing masked multi-head attention), layer normalization, GELU activations, shortcut connections, and output layers. The discussion covers layer normalization, feed forward networks, GELU activations, and shortcut connections, culminating in the complete GPT model architecture, ready for pre-training and fine-tuning in subsequent chapters. The episode also touches on generating text using the model, explaining how token IDs are transformed into vectors and back, and previews the next chapter on model training.

Outlines

Part 1: GPT Model Architecture

Part 2: Core Components

Part 3: Text Generation

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval