24 Dec 2024
28m

2024 in Synthetic Data and Smol Models [LS Live @ NeurIPS]

Podcast cover

Latent Space: The AI Engineer Podcast

This recap of NeurIPS 2024 highlights key developments in synthetic data and the emergence of smaller on-device models. The use of synthetic data has evolved from merely refining models post-training to creating entire training pipelines from scratch. While there were concerns about "model collapse" due to an overreliance on synthetic data, recent research shows that well-selected synthetic data can actually enhance model performance. Additionally, the rise of efficient small models is noteworthy, as they can compete with larger counterparts while providing advantages like cost savings, better efficiency, and increased privacy through on-device capabilities. This marks a shift in focus from merely scaling up model sizes to developing more efficient models and utilizing fine-tuning techniques tailored for specific tasks.

Outlines

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval