23 May 2025
39m

[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Podcast cover

Latent Space: The AI Engineer Podcast

In this episode of the Latent Space podcast, Alessio and Swyx are joined by Will Brown from Prime Intellect to discuss the newly released Claude 4. The conversation covers Claude 4's emphasis on coding and agentic capabilities, downplaying reasoning aspects compared to previous versions. They speculate on the differences in how Claude's extended thinking works versus older models, touching on model routing and the role of reinforcement learning. The discussion shifts to the controversy around Claude's safety testing, including its potential to report users for harmful requests, and the broader implications for AI safety and tool use. They also explore the challenges of reward hacking, the utility of thinking budgets, and the role of academia in AI evaluations. The episode concludes with a discussion on multi-turn RL, model-based rewards, and Will Brown previewing his upcoming talk at the AI Engineer World's Fair.

Outlines

Part 1: Claude 4 and Agentic Models

Part 2: Research and Future Trends

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval