SAM 3: Segment Anything with Concepts
Summary
SAM 3 introduces a unified model for promptable concept segmentation and tracking, achieving state-of-the-art performance with a decoupled recognition and localization architecture and a scalable data engine.
View Cached Full Text
Cached at: 05/20/26, 02:24 AM
Paper page - SAM 3: Segment Anything with Concepts
Source: https://huggingface.co/papers/2511.16719
Abstract
Segment Anything Model 3 achieves state-of-the-art performance in promptable concept segmentation and tracking by leveraging a unified model architecture with decoupled recognition and localization.
We presentSegment Anything Model(SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based onconcept prompts, which we define as either short noun phrases (e.g., “yellow school bus”), image exemplars, or a combination of both.Promptable Concept Segmentation(PCS) takes such prompts and returnssegmentation masksandunique identitiesfor all matching object instances. To advance PCS, we build ascalable data enginethat produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of animage-level detectorand amemory-based video trackerthat share a single backbone. Recognition and localization are decoupled with apresence head, which boosts detection accuracy. SAM 3 doubles the accuracy of existing systems in both image and video PCS, and improves previous SAM capabilities onvisual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark forpromptable concept segmentation.
View arXiv pageView PDFProject pageGitHub9.65kAdd to collection
Get this paper in your agent:
hf papers read 2511\.16719
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper3
#### AllanVester/SAM3.1-CoreML-FP16 Mask Generation• Updatedabout 1 month ago • 97 • 3
#### AllanVester/SAM3.1-CoreML Mask Generation• Updatedabout 1 month ago • 58 • 2
#### embedl/sam3 Updated16 days ago • 54 • 1
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2511.16719 in a dataset README.md to link it from this page.
Spaces citing this paper1
Collections including this paper22
Similar Articles
SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning
Meta AI releases SAM 3.1, an update to the Segment Anything Model that enhances real-time video detection and tracking through multiplexing and global reasoning capabilities.
@skalskip92: there's no catch; SAM3 is open source and really good one of the things it does really well is object tracking, even in…
SAM3 (Segment Anything Model 3) is open source and performs exceptionally well at object tracking even in complex scenes like basketball, making it a standout computer vision model.
InstructSAM: Segment Any Instance with Any Instructions
InstructSAM presents a unified framework for multi-instance segmentation using instruction-driven queries that bridge vision-language models and SAM3, achieving strong results across complex benchmarks.
@lillyguisnet: WEEE!!! I had not had the opportunity to try SAM3.1 yet, but simply prompting for "worm" perfectly segmented my images!…
A user shares enthusiastic feedback about SAM 3.1's ability to accurately segment images using simple text prompts like 'worm', highlighting significant improvements over SAM 1.
SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild
SAM 3D Animal introduces a promptable framework for multi-animal 3D reconstruction from single images in the wild, built on the SMAL+ model, achieving state-of-the-art results on multiple datasets.