19 Jan 2025
1h 0m

Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang)

Podcast cover

Latent Space: The AI Engineer Podcast

This interview podcast features Amir Haghighat and Yineng Zhang from Baseten, a leading large language model (LLM) inference platform, discussing the newly released DeepSeek v3 model. The discussion begins with an overview of DeepSeek v3's capabilities and its ranking on the LM Arena leaderboard, highlighting its significance as the best open-weights model. The conversation then shifts to the challenges of serving such large models, focusing on Baseten's use of H200 clusters and the importance of frameworks like SGLang for efficient inference. Finally, the speakers delve into the three pillars of mission-critical inference workloads—model-level performance, horizontal scalability, and developer experience—with a detailed explanation of Baseten's approach and the unique features of SGLang. A specific takeaway is the emphasis on Baseten's consumption-based pricing model, contrasting it with per-token pricing and highlighting its suitability for customers with custom models and strict performance requirements.

Outlines

Part 1: Introduction, DeepSeek v3

Part 2: DeepSeek v3, Technical Challenges

Part 3: Baseten's Solutions, Inference

Part 4: Future Trends, Conclusion

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval