Tag
Proposes the Bag of Dims framework showing that the standard basis of transformer hidden states provides a training-free, architecture-general feature representation where dimensions encode semantic content via sign patterns; validated across language, vision, and audio models, achieving high accuracy with no learned rotations.
Proposes MechRL, a reinforcement learning approach to automate circuit discovery in transformer language models. A PPO agent trained on multiple tasks discovers attention head circuits that match known canonical circuits and generalizes to a held-out task.
This paper identifies a Möbius attractor and Cascade Supervision as key mechanisms for the emergence of superposition reasoning in transformers, closing a theoretical gap on gradient descent convergence for graph reachability tasks.
A new preprint titled 'Mathematics is All You Need 2' presents the 'Two-Channel theorem,' demonstrating that behavioral fibers in transformer residual streams are sign-stabilized and causally steerable across different architectures (Qwen to Llama). The study claims high reproducibility and shows that the behavioral substrate is near-one-dimensional, separating generation from latent structure.
This research paper analyzes the internal mechanics of Large Vision-Language Models (LVLMs) using information theory, revealing that attention mechanisms may be redundant while Feed-Forward Networks drive semantic innovation. The authors demonstrate that replacing learned attention weights with random values can yield comparable performance, suggesting current models 'get lost in attention'.