22 Dec 2024
57m

2024 in Vision [LS Live @ NeurIPS]

Podcast cover

Latent Space: The AI Engineer Podcast

The Latent Space LIVE mini-conference at NeurIPS 2024 showcased the latest breakthroughs in computer vision. Speakers from Roboflow and Moondream discussed significant trends, such as the transition from per-image to video processing seen in models like Sora and SAM2. They also noted how DETR models are now outpacing YOLO in real-time object detection. Additionally, they addressed the challenges faced by large language models (LLMs) in capturing intricate visual details, referencing important research like MMVP, Florence 2, PolyGemma 2, and AIM-V2. The session included a lively discussion on the shortcomings of current vision-language models (VLMs) in complex visual tasks and explored how synthetic data and chain-of-thought prompting could enhance their effectiveness.

Outlines

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval