Tag
Introduces CAFE, a benchmark for evaluating whether promptable segmentation models truly understand concepts by using counterfactual attribute manipulation, revealing that accurate mask prediction does not guarantee faithful semantic grounding.
SAM 3 introduces a unified model for promptable concept segmentation and tracking, achieving state-of-the-art performance with a decoupled recognition and localization architecture and a scalable data engine.