SAM 3: Segment Anything with Concepts

Papers with Code Trending 11/20/25, 06:59 PM Papers

Summary

SAM 3 introduces a unified model for promptable concept segmentation and tracking, achieving state-of-the-art performance with a decoupled recognition and localization architecture and a scalable data engine.

We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of an image-level detector and a memory-based video tracker that share a single backbone. Recognition and localization are decoupled with a presence head, which boosts detection accuracy. SAM 3 doubles the accuracy of existing systems in both image and video PCS, and improves previous SAM capabilities on visual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation.

Original Article

View Cached Full Text

Cached at: 05/20/26, 02:24 AM

Paper page - SAM 3: Segment Anything with Concepts

Source: https://huggingface.co/papers/2511.16719

Abstract

Segment Anything Model 3 achieves state-of-the-art performance in promptable concept segmentation and tracking by leveraging a unified model architecture with decoupled recognition and localization.

We presentSegment Anything Model(SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based onconcept prompts, which we define as either short noun phrases (e.g., “yellow school bus”), image exemplars, or a combination of both.Promptable Concept Segmentation(PCS) takes such prompts and returnssegmentation masksandunique identitiesfor all matching object instances. To advance PCS, we build ascalable data enginethat produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of animage-level detectorand amemory-based video trackerthat share a single backbone. Recognition and localization are decoupled with apresence head, which boosts detection accuracy. SAM 3 doubles the accuracy of existing systems in both image and video PCS, and improves previous SAM capabilities onvisual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark forpromptable concept segmentation.

View arXiv page View PDF Project page GitHub9.65k Add to collection

Get this paper in your agent:

hf papers read 2511\.16719

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper3

#### AllanVester/SAM3.1-CoreML-FP16 Mask Generation• Updatedabout 1 month ago • 97 • 3 #### AllanVester/SAM3.1-CoreML Mask Generation• Updatedabout 1 month ago • 58 • 2 #### embedl/sam3 Updated16 days ago • 54 • 1

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2511.16719 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper22

Browse 22 collections that include this paper

SAM 3: Segment Anything with Concepts

Paper page - SAM 3: Segment Anything with Concepts

Abstract

Models citing this paper3

Datasets citing this paper0

Spaces citing this paper1

Collections including this paper22

Similar Articles

SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning

@skalskip92: there's no catch; SAM3 is open source and really good one of the things it does really well is object tracking, even in…

InstructSAM: Segment Any Instance with Any Instructions

idea-research/ram-grounded-sam

SAM-MT: Real-Time Interactive Multi-Target Video Segmentation

Submit Feedback

Similar Articles

SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning

@skalskip92: there's no catch; SAM3 is open source and really good one of the things it does really well is object tracking, even in…

InstructSAM: Segment Any Instance with Any Instructions

idea-research/ram-grounded-sam

SAM-MT: Real-Time Interactive Multi-Target Video Segmentation