22 Jan 2026
43m

Inferact: Building the Infrastructure That Runs Modern AI

Podcast cover

The a16z Show

The discussion centers on the increasing complexity of AI inference, contrasting it with the challenges of training models. Simon Mo and Woosuk Kwon, co-founders of Inferact and creators of vLLM, detail how the open-source inference engine addresses the growing demands of running large language models efficiently. They highlight the shift from static, standardized inputs in traditional machine learning to the dynamic, unpredictable nature of LLM requests. The conversation covers the importance of scheduling and memory management, particularly with the advent of AI agents requiring persistent states and external tool interactions. They emphasize the role of open-source in promoting diversity in models and hardware, enabling tailored solutions for specific use cases.

Outlines

Part 1: Context, Origins

Part 2: Community, Governance

Part 3: Architecture, Technical Challenges

Part 4: Open-Source Value, Deployments

Part 5: Inferact, Future Vision

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval