video-tokenisation

#video-tokenisation

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting [R]

Reddit r/MachineLearning ↗ · 13h ago

This paper introduces an adaptive video tokenisation method that exploits temporal redundancy in latent space to allocate tokens dynamically, achieving efficient compression without auxiliary networks. The proposed Latent Inpainting Transformer reconstructs dropped positions, delivering 31x speedup over ElasticTok-CV and 2x over InfoTok.

0 favorites 0 likes

video-tokenisation

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting [R]

Submit Feedback