Tag
LatentRAG is a novel framework that shifts reasoning and retrieval for agentic RAG into continuous latent space, reducing inference latency by approximately 90% while maintaining performance comparable to explicit methods.
This paper investigates multilingual latent reasoning in large reasoning models across 11 languages, revealing that while latent reasoning capabilities exist, they are unevenly distributed—stronger in resource-rich languages and weaker in low-resource ones. The study finds that despite surface-level differences, the internal reasoning mechanisms are largely aligned with an English-centered pathway.
OneVL is a unified vision-language-action framework that compresses chain-of-thought reasoning into latent tokens supervised by both language and visual world model decoders, achieving state-of-the-art trajectory prediction accuracy for autonomous driving at answer-only inference latency. It is the first latent CoT method to surpass explicit CoT across four benchmarks.