Tag
This paper presents DTVEM-RE, a hierarchical random-effects extension of the Differential Time-Varying Effect Model that estimates person-specific multi-lag coefficients via Hamiltonian Monte Carlo in Stan, addressing a limitation of the original DTVEM which assumed a single group-level lag structure. Simulation and empirical results demonstrate recovery of between-person variance and improvements over hierarchical and non-hierarchical baselines.
This paper proposes a query-based cross-modal projector that compresses visual tokens via cross-attention to improve Mamba-based multimodal LLMs, boosting both performance and throughput on vision-language benchmarks while eliminating the need for manual 2D scan order design.
LDARNet is a 120M-parameter hierarchical genomic foundation model that introduces learnable adaptive tokenization (inspired by H-Net's dynamic chunking) for masked language modeling on DNA sequences. It achieves state-of-the-art results on 5 histone modification tasks and outperforms models up to 20× larger on several genomic benchmarks, with learned token boundaries aligning with biological features like promoter motifs and splice junctions.
EnergyMamba proposes a novel spatiotemporal framework combining a graph-enhanced selective state space model and adaptive conformalized quantile regression for accurate and reliable energy consumption prediction with uncertainty estimates, achieving improvements on real-world datasets from Florida, New York, and California.
This paper proposes a sleep-like consolidation mechanism for transformer models that uses fast weights and recurrent passes to improve long-context processing while maintaining inference speed.
MVCHead is a novel method for generating 3D Gaussian head avatars from single 2D images without multi-view data, using hierarchical state space models and multi-view consistency enforcement.
This paper proposes Physics-Informed Multi-Scale Mamba (PIMSM), a state-space architecture that aligns model memory with physical timescales to improve robustness under distribution shift in scientific time series, demonstrating improvements on fMRI and weather forecasting tasks.
Researchers introduce Raven, a novel sequence model that merges state space model efficiency with a selective slot-updating mechanism inspired by sliding window attention to improve long-context retrieval. The approach offers a more principled alternative to existing linear-time models.