YouTube06 Feb 2025
18m

Lessons from the Trenches: Building LLM Evals That Work IRL: Aparna Dhinkaran

Podcast cover

AI Engineer

Aparna Dhinakaran from Arize discusses LLM evaluations and observability, differentiating between model evals and task evals, and focusing on the latter. She explains how LLMs are used as judges in real-world applications, particularly in chat-to-purchase e-commerce setups involving routers and function calls. Aparna emphasizes the importance of evals at different levels of an application, especially at the router level, and demonstrates this using Arize's open-source product, Phoenix, to trace and evaluate application performance. She shares best practices, highlighting the value of evals with explanations for effective iteration and improvement, and presents research findings on numeric vs. categorical evals and the impact of context window placement in RAG applications, concluding with an invitation to the Arize Observe event.

Outlines

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval