audio-visual-reasoning

Tag

Cards List
#audio-visual-reasoning

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Hugging Face Daily Papers · 2026-05-21 Cached

LatentOmni proposes a cross-modal reasoning framework that interleaves textual reasoning with audio-visual latent states, outperforming explicit text-based chain-of-thought methods in audio-visual reasoning tasks.

0 favorites 0 likes
← Back to home

Submit Feedback