Tag
Proposes the Mamba-Assisted Closure (MAC) framework, a Mamba-based sequence model for non-Markovian closure in reduced-order modeling of high-dimensional dynamical systems, outperforming GRU-based and Markovian methods on Burgers' equation and Lorenz '96 systems.
This paper introduces generic triple-latent recurrent models that compress token pair interactions into a latent state, and a gated associative retrieval variant that improves exact recall. The hybrid model outperforms Transformers on byte-level WikiText-2 and a tokenized language benchmark, achieving up to 41.9% associative recall versus 25%.
This paper proposes Q-align DT, a framework that aligns return-to-go with Q-values to improve controllability and performance in offline reinforcement learning, achieving superior results on D4RL benchmarks.
Proposes Interdomain Attention, a new method that integrates state space models into attention via kernel methods, achieving efficient long-context modeling with a fixed-size state and outperforming SSMs and softmax attention in language modeling experiments up to 1.3B parameters.
The author presents SM1, a variant of Mamba1 with d_state=1, using two native PyTorch ops to replace the selective scan, reducing memory by 16x compared to d_state=16. The closed-form solution eliminates the state dimension, enabling efficient inference with constant memory per token.