sequence-models

#sequence-models

Black-Mamba: Biologically-Inspired Leaky Accumulation for Conceptual Knowledge under Distribution Drift

arXiv cs.AI ↗ · 2026-07-22 Cached

Black-Mamba introduces a test-time adaptive forecasting architecture that uses accumulated surprisal to selectively update memory only upon evidence of distribution drift, achieving efficient adaptation on non-stationary time series.

0 favorites 0 likes

#sequence-models

The KV-cache wall: why fixed-size memory sequence models keep coming back

Reddit r/ArtificialInteligence ↗ · 2026-06-25

Explores the growing memory bottleneck of KV-cache in transformer inference, explaining why alternative architectures with fixed-size memory like Mamba and RWKV are gaining renewed attention.

0 favorites 0 likes

#sequence-models

An Update on Matrix Recurrent Units, an Attention Alternative [R]

Reddit r/MachineLearning ↗ · 2026-06-21

An update on Matrix Recurrent Units (MRU), a linear-time attention alternative. The author explores methods to stabilize training, finding that orthogonal matrices underperform while LDU factorization works best, and shows MRU underperforms transformers on larger datasets like TinyStories.

0 favorites 0 likes

#sequence-models

Recurrent Reasoning on Symbolic Puzzles with Sequence Models

arXiv cs.AI ↗ · 2026-06-16 Cached

This paper introduces RecurrReason, a difficulty-controlled benchmark of four symbolic logic puzzles to evaluate multi-step reasoning in sequence models. Fine-tuning experiments on T5 and GPT-2 show that architecture determines success more than scale, and that pre-training transfer depends on local transition structure.

0 favorites 0 likes

#sequence-models

The Dark Regulome: Disentangling Predictability from Regulation in Genomic Foundation Models

arXiv cs.CL ↗ · 2026-06-08 Cached

This paper introduces a residualization-and-permutation diagnostic to separate predictability-driven from regulation-driven variance in regulatory importance scores from genomic foundation models, applied to dark genome elements at glioma-relevant loci.

0 favorites 0 likes

#sequence-models

SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition

arXiv cs.AI ↗ · 2026-06-02 Cached

SHARP introduces a bio-inspired framework that separates memory accumulation from pattern recognition, using accelerated replay during offline sleep phases to learn long-range non-stationary temporal patterns in streaming settings. It improves context retention on text8 and PG-19 while maintaining computational efficiency.

0 favorites 0 likes

#sequence-models

The Need for an External Observer Formalizing the Sufficiency Gap: A Mathematical Extension of Mixture Identifiability and Contextual Grounding in Sequence Models

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper formalizes the sufficiency gap in next-token prediction, demonstrating that even ideal sequence models can become overconfident when textual prefixes are not sufficient statistics for latent circumstances. It proposes an external observer mechanism to reduce but not eliminate this gap.

0 favorites 0 likes

#sequence-models

Conditional Attribute Estimation with Autoregressive Sequence Models

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper introduces Conditional Attribute Transformers, a method for jointly estimating next-token probability and attribute values conditionally, enabling credit assignment, counterfactual analysis, and steerable generation in a single forward pass.

0 favorites 0 likes

#sequence-models

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper introduces Toeplitz MLP Mixers (TMM), a novel architecture that replaces attention with Toeplitz matrix multiplication to achieve lower computational complexity while maintaining high information retention and training efficiency.

0 favorites 0 likes

sequence-models

Submit Feedback