28 Nov 2024
1h 11m

The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents โ€” with Erik Schluntz, Anthropic

Podcast cover

Latent Space: The AI Engineer Podcast

In this podcast, Erik Schluntz from Anthropic discusses his work on SWE-Bench, a benchmark designed to evaluate coding agents and enhance the computer capabilities of large language models (LLMs). He explains how he created a streamlined agent framework that enables LLMs to autonomously tackle coding tasks, stressing the significance of effective tools and prompts. Schluntz also addresses the challenges of achieving high accuracy on SWE-Bench, explores the potential of multi-modal and multi-agent strategies, and shares his views on the current landscape and future of AI in robotics, highlighting both the exciting possibilities and the hurdles related to reliability and cost.

Outlines

Sign in to continue reading, translating and more.

Continue
ย 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval