sequence-modeling

Tag

Cards List
#sequence-modeling

Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

arXiv cs.LG · 3d ago Cached

Proposes the Mamba-Assisted Closure (MAC) framework, a Mamba-based sequence model for non-Markovian closure in reduced-order modeling of high-dimensional dynamical systems, outperforming GRU-based and Markovian methods on Burgers' equation and Lorenz '96 systems.

0 favorites 0 likes
#sequence-modeling

Generic Triple-Latent Compression with Gated Associative Retrieval

arXiv cs.CL · 3d ago Cached

This paper introduces generic triple-latent recurrent models that compress token pair interactions into a latent state, and a gated associative retrieval variant that improves exact recall. The hybrid model outperforms Transformers on byte-level WikiText-2 and a tokenized language benchmark, achieving up to 41.9% associative recall versus 25%.

0 favorites 0 likes
#sequence-modeling

Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

arXiv cs.LG · 2026-05-29 Cached

This paper proposes Q-align DT, a framework that aligns return-to-go with Q-values to improve controllability and performance in offline reinforcement learning, achieving superior results on D4RL benchmarks.

0 favorites 0 likes
#sequence-modeling

Interdomain Attention: Beyond Token-Level Key-Value Memory

arXiv cs.LG · 2026-05-26 Cached

Proposes Interdomain Attention, a new method that integrates state space models into attention via kernel methods, achieving efficient long-context modeling with a fixed-size state and outperforming SSMs and softmax attention in language modeling experiments up to 1.3B parameters.

0 favorites 0 likes
#sequence-modeling

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

Reddit r/MachineLearning · 2026-05-23

The author presents SM1, a variant of Mamba1 with d_state=1, using two native PyTorch ops to replace the selective scan, reducing memory by 16x compared to d_state=16. The closed-form solution eliminates the state dimension, enabling efficient inference with constant memory per token.

0 favorites 0 likes
← Back to home

Submit Feedback