25 Jun 2024
1h 21m

State of the Art: Training >70B LLMs on 10,000 H100 clusters

Podcast cover

Latent Space: The AI Engineer Podcast

This podcast episode explores the recent developments in Databricks' text to image model, the challenges faced in building and managing custom machine clusters for GPU training, the significance of evaluating datasets and models in natural language understanding and reasoning tasks, and the potential of coding agents and APIs in improving model performance. The speakers discuss the importance of understanding the data used to train AI models, the release of various components to simplify the process of training foundation models, and the challenges and considerations involved in building and operating massive GPU clusters. They also highlight the significance of health checks, evaluation metrics, and tool use in the research process. The episode concludes with a focus on Imbue's future projects and their commitment to delivering value and innovation.

Outlines

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval