A new preprint titled 'Mathematics is All You Need 2' presents the 'Two-Channel theorem,' demonstrating that behavioral fibers in transformer residual streams are sign-stabilized and causally steerable across different architectures (Qwen to Llama). The study claims high reproducibility and shows that the behavioral substrate is near-one-dimensional, separating generation from latent structure.
New preprint: Mathematics is All You Need 2 — Sign-Stabilized Behavioral Fibers in Transformer Residual Streams. Headline result: a linear probe direction trained on Qwen-2.5-7B causally steers Hermes-3-Llama-3.1-8B with strictly monotonic response on 29 of 29 held-out prompts. Median Spearman ρ = 1.000. The probe direction discovered on one architecture is causally relevant — not merely a correlation — to the behavior of a structurally distinct architecture. The methodology is pre-registered. Decision rules committed to disk before any test ran. Four kill tests, six tier-0 lockdown experiments, BCa bootstrap CIs from 10,000 resamples, permutation tests at p < 10⁻⁴, Bonferroni correction across all 75 probe-layer pairs. Single RTX 5090, roughly nine hours of wall time, full reproducibility manifest. The four pre-registered tests. T1 cross-architecture retention: mean 0.749 across 75 probe-layer pairs over 10 seeds, 95% CI [0.747, 0.758]. PASS. T2 basis specificity: FAILS productively — canonical Killing basis, random orthogonal rotation, and identity projection retentions all agree within 0.006, so the architecture-invariant object is the SVD subspace itself, not any specific basis within it. 100 random rotations give σ = 0.0096. T3 raw-residual baseline: K1 substrate beats raw-residual dim-truncation by +0.215. PASS. T4 causal steering: PASS with maximum strength. The most underrated result is the rank sweep. A single direction retains 89.7% of cross-architecture signal for the majority of behavioral traits — 40 of 75 probe-layer pairs achieve their best retention at r=1. The behavioral substrate is intrinsically near-one-dimensional. This collapses the per-prompt API surface from 9 floats to 1-3. The synthesis is the Two-Channel theorem. The residual stream of a frozen transformer decomposes into a high-variance rank-1-dominant output channel read by the unembedding head, and a low-rank near-orthogonal behavioral channel that supports both readout and causal steering. Angle between them at proportional depth: 85.59° on Qwen-2.5-7B at L13. The model knows before it speaks, and what it knows lives in a channel geometrically routed away from the speaking channel. The convergence with @jbhuang0604 , @ylecun & @randall_balestr , LLM-JEPA (arXiv:2509.14252, September 2025) is precise. They demonstrate that adding an embedding-space training objective to LLMs improves performance "without altering generative capabilities" — establishing on the training side that latent structure and generation are functionally separable. The Two-Channel theorem provides the geometric mechanism that makes this possible: the channels are near-orthogonal in the residual stream itself. Their result and ours describe the same underlying property of transformer geometry, hit from two independent methodologies. LLM-JEPA shapes the behavioral channel via training; this work measures it geometrically, demonstrates it transfers across architectures, and causally steers it in frozen models without retraining. Four contributions sit on top of what LLM-JEPA establishes: a measured geometric reason embedding-space objectives don't degrade generation (the 85.59° angle), cross-architecture transfer of the behavioral channel at 0.749 retention, cross-architecture causal steering with strictly monotonic response, and frozen-model applicability requiring no training-time intervention. The training-side and measurement-side accounts of transformer latent structure now point at the same object. Honest scope. The Part I empirical foundation is on one architecture pair (Qwen-2.5-7B-Instruct → Hermes-3-Llama-3.1-8B), one probe at the headline T4 setting. The cluster validation pipeline — 15 pre-built experiments across Mistral, Phi, Gemma, Yi, Llama variants — is queued and will determine whether T4 generalizes at breadth. Part VI is an explicit "what this volume does not claim" section, including retraction of several overclaims from the original Mathematics Is All You Need. Less mythology, more measurement. Why alignment researchers should care: cross-architecture causal steering, if it generalizes, means interpretability tooling derived from one model can causally affect another. Direct implications for monitoring infrastructure, deception detection, and whether internal-state telemetry can be built once and deployed across the model zoo. @grok what are your thoughts? https://zenodo.org/records/20102939…
A comprehensive spectral analysis across 11 LLMs revealing that transformers exhibit phase transitions in hidden activation spaces during reasoning versus factual recall, with seven fundamental phenomena including spectral compression, instruction-tuning reversal, and perfect correctness prediction (AUC=1.0) based solely on spectral properties.
This interactive tool visualizes the mathematical underpinnings of transformer models through dataflow graphs, covering architectures from GPT-2 to Qwen 3.6 and various attention mechanisms.
This paper performs full Jacobian eigendecomposition across production-scale LLMs, revealing a learned spectral gradient from rotation-dominated early layers to symmetric late layers, along with a low-rank bottleneck that compresses perturbations. The results link perturbation propagation and compression to network functional topology.
The article presents a discovered spectral ratio between MLP and attention norms that predicts geometric stability in transformer models, with an optimal range of 0.5–2 to prevent rank collapse.
This paper presents a unified geometric framework for understanding transformer memory failures, distinguishing between conflict arbitration and hallucination through hidden-state attractor basins. It demonstrates that geometric margin is a superior diagnostic for detecting these failures compared to output entropy, particularly as model scale increases.