Tag
This paper introduces AIRA-Compose and AIRA-Design, dual frameworks using AI agents to autonomously discover neural architectures that outperform standard Transformers and scale efficiently.
This paper challenges the claim that prediction bottlenecks in models like Mamba recover causal structure, demonstrating through a new benchmark that gains are largely due to confounds and robustness artifacts rather than true causal discovery.
Introduces triattention v3, a new attention mechanism that enables safe eviction without recall loss for long-context inference, demonstrated on a hybrid mamba+attention model up to 256k tokens.