Tag
This paper presents Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that efficiently handle long contexts through architectural search and large-scale pretraining, outperforming traditional KV cache methods in accuracy, speed, and memory usage.
This paper proposes a task-routed mixture-of-experts model with cognitive appraisal theory for implicit sentiment analysis, introducing auxiliary tasks to improve reasoning about sentiment from context and outperforming existing approaches.
This paper presents a physics-informed convolutional encoder–decoder network to predict pore-scale velocity fields from porous media geometry, and demonstrates that using network predictions to initialize Lattice-Boltzmann simulations accelerates convergence in over 90% of cases.
Proposes block-based double decoders, a novel transformer architecture using doubly-causal block-based attention masks to combine decoder-only training efficiency with encoder-decoder inference efficiency, achieving strong scaling performance and reduced KV-cache memory.
This paper applies Group Relative Policy Optimization (GRPO) to encoder-decoder Seq2Seq models for machine translation fine-tuning, using reference-free rewards (LaBSE and COMET-Kiwi) that require no parallel data, and achieves consistent improvements across 13 languages.
NVIDIA utilized late interaction, a form of sparse attention, for an attention-based encoder-decoder to retrieve directly from internal representations.
INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.
SAM 3D Body is a promptable 3D human mesh recovery model using a novel parametric representation (MHR) and encoder-decoder architecture, achieving state-of-the-art performance with strong generalization. The model supports auxiliary prompts and is open-source.
Google introduces T5Gemma, a new collection of encoder-decoder models adapted from the Gemma 2 decoder-only architecture, offering improved quality-efficiency trade-offs for tasks like summarization and translation.