MARCO: Navigating the Unseen Space of Semantic Correspondence

Hugging Face Daily Papers 04/20/26, 12:00 AM Papers

Summary

MARCO introduces a compact, fast model for semantic correspondence that achieves state-of-the-art accuracy and generalization to unseen keypoints using a coarse-to-fine objective and self-distillation framework with DINOv2.

Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building upon DINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances both fine-grained localization and semantic generalization. By coupling a coarse-to-fine objective that refines spatial precision with a self-distillation framework, which expands sparse supervision beyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that amplify at fine-grained localization thresholds (+8.9 [email protected]), strongest generalization to unseen keypoints (+5.1, SPair-U) and categories (+4.7, MP-100), while remaining 3x smaller and 10x faster than diffusion-based approaches. Code is available at https://github.com/visinf/MARCO .

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/21/26, 07:46 PM

Paper page - MARCO: Navigating the Unseen Space of Semantic Correspondence

Source: https://huggingface.co/papers/2604.18267

Abstract

MARCO is a compact, fast model that improves semantic correspondence accuracy and generalization beyond training data by using a coarse-to-fine objective and self-distillation framework with DINOv2 and diffusion backbones.

Recent advances insemantic correspondencerely ondual-encoder architectures, combiningDINOv2withdiffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building uponDINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances bothfine-grained localizationandsemantic generalization. By coupling acoarse-to-fine objectivethat refines spatial precision with aself-distillation framework, which expandssparse supervisionbeyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that amplify atfine-grained localizationthresholds (+8.9 [email protected]), strongest generalization to unseen keypoints (+5.1, SPair-U) and categories (+4.7, MP-100), while remaining 3x smaller and 10x faster than diffusion-based approaches. Code is available at https://github.com/visinf/MARCO .

View arXiv page View PDF Project page GitHub Add to collection

Get this paper in your agent:

hf papers read 2604\.18267

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.18267 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.18267 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.18267 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

MARCO: Navigating the Unseen Space of Semantic Correspondence

Paper page - MARCO: Navigating the Unseen Space of Semantic Correspondence

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

RemoteZero: Geospatial Reasoning with Zero Human Annotations

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

Submit Feedback

Similar Articles

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

RemoteZero: Geospatial Reasoning with Zero Human Annotations

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding