Tag
TrackCraft3R repurposes video diffusion transformers for dense 3D tracking from monocular video, using dual-latent representation and temporal RoPE alignment to achieve state-of-the-art performance with 1.3x faster speed and 4.6x less peak memory than prior methods.