Representation Alignment Rests on Linear Structure
Summary
This paper investigates the Platonic Representation Hypothesis, proposing that alignment arises from linear structure in representations, and introduces a statistical framework of signal, bias, and noise.
View Cached Full Text
Cached at: 05/29/26, 09:12 AM
# Representation Alignment Rests on Linear Structure
Source: [https://arxiv.org/abs/2605.28870](https://arxiv.org/abs/2605.28870)
[View PDF](https://arxiv.org/pdf/2605.28870)
> Abstract:We investigate the Platonic Representation Hypothesis \(PRH\) through a tripartite statistical framework of representations: signal, bias, and noise\. \{1\) Signal:\} We propose that Platonic alignment arises from the universal relationship between objects and attributes, which is encoded linearly in representations according to the Linear Representation Hypothesis \(LRH\)\. We provide evidence that LRH helps explain PRH by extracting linear object\-attribute features with sparse autoencoders and showing that these sparse representations often exhibit stronger cross\-modal alignment than their dense counterparts\. \{2\) Bias:\} Models have different implicit biases due to the diverse architectures and training procedures used\. We show that this difference can be partially mitigated\. Centering and normalization consistently improve cross\-model alignment\. \{3\) Noise:\} Finite\-sample training leads to noise in representations\. We provide evidence that representational noise is driven by data scarcity by revealing a strong and consistent positive correlation between word frequency and alignment in LLMs and text embedding models\. Synthesizing signal, bias, and noise, we propose a statistical model that refines the Linear Representation Hypothesis and explains further phenomena related to the alignment of representations emerging from diverse modern AI architectures\.
## Submission history
From: Kiril Bangachev \[[view email](https://arxiv.org/show-email/4c96c204/2605.28870)\] **\[v1\]**Fri, 22 May 2026 12:59:01 UTC \(4,726 KB\)Similar Articles
Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning
This paper investigates the Platonic Representation Hypothesis by examining 16 language models across 8 families on 800 reasoning problems. It finds that while models converge in internal representations, they diverge in reasoning processes, especially post-decision, and shared representations have minimal causal influence on predictions.
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
This arXiv preprint introduces GRALIS, a unified mathematical framework using Riesz Representation Theory to formalize and compare linear attribution methods like SHAP, LIME, and Integrated Gradients.
Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry
This paper investigates whether fMRI representations from different subjects' visual cortices can be aligned using unsupervised geometric methods, finding evidence for approximately isometric structure across individuals, extending the Platonic Representation Hypothesis to human brains.
Mechanistic Analysis of Alignment Algorithms in Language Models
This paper presents a systematic mechanistic analysis of six preference optimization methods (PPO, DPO, SimPO, ORPO, GRPO, KTO) across three open-weight model families, using probing and sparse autoencoders to reveal how alignment algorithms reshape internal representations in qualitatively distinct ways.
Graph Alignment Topology as an Inductive Bias for Grounding Detection
This paper introduces Graph Alignment Topology as an inductive bias for grounding detection, using a graph neural network to model alignment structure between reference information and LLM outputs. The method achieves state-of-the-art results on multiple hallucination and question-answering datasets, outperforming GPT-4o.