Tag
This paper introduces a two-stage token selection framework for visual geometry transformers that reduces computational costs by restricting key/value tokens during global attention, achieving over 85% acceleration on scenes with 500 images while maintaining baseline performance.