sequence-modeling

#sequence-modeling

HantaWatch: Federated Learning for Hantavirus Genomic Surveillance

arXiv cs.LG ↗ · 2026-07-21 Cached

HantaWatch is a federated learning framework for hantavirus genomic surveillance that enables collaborative training of sequence-based models without sharing raw data, integrating k-mer feature extraction and adaptive optimization to support risk screening and expert prioritization.

0 favorites 0 likes

#sequence-modeling

Flexformer: Flexible Linear Transformer with Learnable Attention Kernel

arXiv cs.LG ↗ · 2026-06-29 Cached

Flexformer proposes a flexible linear Transformer with fully learnable attention kernels using random Fourier features, achieving linear complexity while matching or exceeding softmax attention performance on language modeling and sequence classification tasks.

0 favorites 0 likes

#sequence-modeling

On Subquadratic Architectures: From Applications to Principles

Hugging Face Daily Papers ↗ · 2026-06-10 Cached

This paper compares xLSTM, Mamba-2, and Gated DeltaNet on complex sequence modeling tasks and finds xLSTM superior due to its enhanced state tracking and memory dynamics, validated on synthetic length-generalization tasks.

0 favorites 0 likes

#sequence-modeling

Between Amnesia and Chaos: A Memory Stability Expressivity Trilemma for Trainable Dissipative Oscillator Networks

arXiv cs.LG ↗ · 2026-06-10 Cached

This paper presents a memory–stability–expressivity trilemma for trainable dissipative oscillator networks, showing that damping governs all three and limits trainability, with experimental validation on a 20-oscillator network confirming the theoretical bounds.

0 favorites 0 likes

#sequence-modeling

Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

arXiv cs.LG ↗ · 2026-06-05 Cached

Proposes the Mamba-Assisted Closure (MAC) framework, a Mamba-based sequence model for non-Markovian closure in reduced-order modeling of high-dimensional dynamical systems, outperforming GRU-based and Markovian methods on Burgers' equation and Lorenz '96 systems.

0 favorites 0 likes

#sequence-modeling

Generic Triple-Latent Compression with Gated Associative Retrieval

arXiv cs.CL ↗ · 2026-06-05 Cached

This paper introduces generic triple-latent recurrent models that compress token pair interactions into a latent state, and a gated associative retrieval variant that improves exact recall. The hybrid model outperforms Transformers on byte-level WikiText-2 and a tokenized language benchmark, achieving up to 41.9% associative recall versus 25%.

0 favorites 0 likes

#sequence-modeling

Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper proposes Q-align DT, a framework that aligns return-to-go with Q-values to improve controllability and performance in offline reinforcement learning, achieving superior results on D4RL benchmarks.

0 favorites 0 likes

#sequence-modeling

Interdomain Attention: Beyond Token-Level Key-Value Memory

arXiv cs.LG ↗ · 2026-05-26 Cached

Proposes Interdomain Attention, a new method that integrates state space models into attention via kernel methods, achieving efficient long-context modeling with a fixed-size state and outperforming SSMs and softmax attention in language modeling experiments up to 1.3B parameters.

0 favorites 0 likes

#sequence-modeling

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

Reddit r/MachineLearning ↗ · 2026-05-23

The author presents SM1, a variant of Mamba1 with d_state=1, using two native PyTorch ops to replace the selective scan, reducing memory by 16x compared to d_state=16. The closed-form solution eliminates the state dimension, enabling efficient inference with constant memory per token.

0 favorites 0 likes

#sequence-modeling

Next-Latent Prediction Transformers Learn Compact World Models

Papers with Code Trending ↗ · 2025-11-08 Cached

Introduces Next-Latent Prediction (NextLat), a self-supervised objective that trains transformers to predict their next latent state, encouraging compact internal world models and improving generalization across sequence modeling tasks.

0 favorites 0 likes

sequence-modeling

Submit Feedback