02 Jun 2025
24m

[AIEWF Preview] Gemini in 2025 and Realtime Voice AI

Podcast cover

Latent Space: The AI Engineer Podcast

The podcast episode centers on Google's Gemini updates and the future of voice-based AI applications, particularly within the context of the Live API. Logan Kilpatrick and Shrestha Basu Mallick highlight features like thinking budgets for 2.5 Pro and native audio output, emphasizing developer control and multilingual capabilities. The discussion explores the challenges and infrastructure required for real-time voice agents, including voice activity detection and latency reduction, with Kwindla Hultman Kramer offering insights from Daily's partnership with Google. A key point of discussion involves the balance between componentized models and a unified Gemini model, with the ultimate goal of integrating diverse capabilities. The speakers touch on proactive audio and speaker identification as emerging features, and express wishes for more language support and integrated capabilities in future Gemini iterations.

Outlines

Part 1: Introduction, Team Roles

Part 2: Gemini API Features, Caching, UI

Part 3: Live API, Audio/Video, Workflows

Part 4: Partnerships, Infrastructure, Voice Agents

Part 5: Future Outlook, Closing

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval