Tag
This paper introduces a residualization-and-permutation diagnostic to separate predictability-driven from regulation-driven variance in regulatory importance scores from genomic foundation models, applied to dark genome elements at glioma-relevant loci.
SHARP introduces a bio-inspired framework that separates memory accumulation from pattern recognition, using accelerated replay during offline sleep phases to learn long-range non-stationary temporal patterns in streaming settings. It improves context retention on text8 and PG-19 while maintaining computational efficiency.
This paper formalizes the sufficiency gap in next-token prediction, demonstrating that even ideal sequence models can become overconfident when textual prefixes are not sufficient statistics for latent circumstances. It proposes an external observer mechanism to reduce but not eliminate this gap.
This paper introduces Conditional Attribute Transformers, a method for jointly estimating next-token probability and attribute values conditionally, enabling credit assignment, counterfactual analysis, and steerable generation in a single forward pass.
This paper introduces Toeplitz MLP Mixers (TMM), a novel architecture that replaces attention with Toeplitz matrix multiplication to achieve lower computational complexity while maintaining high information retention and training efficiency.