YouTube23 Mar 2025
2h 36m

Build an LLM from Scratch 5: Pretraining on Unlabeled Data

Podcast cover

Sebastian Raschka

This podcast chapter focuses on pre-training large language models (LLMs), starting with assembling building blocks from previous chapters, including data loading, multi-head attention, and the GPT model architecture. It covers evaluating generative text models, generating text using the GPT model, and implementing pre-training in several steps. The discussion includes importing necessary libraries, modifying configuration files, and implementing the GPT model. The podcast further explains measuring text quality using cross-entropy loss, calculating training and validation set losses, training the LLM, and exploring text generation strategies, including temperature scaling and top-K sampling, before concluding with saving, loading, and loading pre-trained weights from OpenAI into the LLM architecture.

Outlines

Part 1: Introduction and Measuring Text Quality

Part 2: Training and Controlling Randomness

Part 3: Saving and Loading Models

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval