scripod.com

SAM 3: The Eyes for AI — Nikhila & Pengchuan (Meta Superintelligence), ft. Joseph Nelson (Roboflow)

Shownote

as with all demo-heavy and especially vision AI podcasts, we encourage watching along on our YouTube (and tossing us an upvote/subscribe if you like!) From SAM 1's 11-million-image data engine to SAM 2's memory-based video tracking, MSL’s Segment Anything...

Highlights

The latest evolution in computer vision has arrived with SAM 3, a model that transforms how machines understand visual content through natural language. In this discussion, leading researchers and practitioners explore how this technology achieves unprecedented precision and speed in identifying and tracking objects across images and video—without relying on traditional annotation methods.
08:55
SAM 3 runs in 30ms per image on H200, enabling real-time performance
20:04
Just 3–5 negative examples significantly improve SAM3's fine-tuning performance.
29:37
LLMs correct SAM 3's errors and provide complex reasoning missing in vision-only models
51:42
SAM3 can correctly count fingers in an image where frontier multimodal models fail.
54:39
SAM 3 benefits from open-source community contributions and real-world testing
1:00:30
SAM 3 enables scalable video analysis for robotics using open text prompts.
1:11:45
Roboflow aims to be the top platform for building with SAM 3 and advancing AI applications.

Chapters

What makes SAM 3 a game-changer in real-time visual understanding?
00:00
How does teaching AI to recognize 200,000+ everyday concepts change segmentation forever?
11:45
Why is combining SAM 3 with large language models the key to smarter vision?
26:42
How did AI cut annotation time from minutes to seconds—and what’s still hard?
39:01
Should vision models do everything themselves, or work with tools?
54:39
Can SAM 3 help robots see and reason like humans?
1:00:30
What tools are needed to turn breakthrough research into real-world impact?
1:11:45

Transcript

Joseph Nelson: Okay, we're here in the remote studio with the grand return of the RoboFlow and Latent Space and SAM combo. Welcome to Joseph, my sort of vision co-host, I guess. Thanks. Great to be here. Welcome back. We also have, welcome back, Nikhila Ra...