Representation Alignment Rests on Linear Structure

arXiv cs.LG 05/29/26, 04:00 AM Papers

representation-alignment linear-representation platonic-representation sparse-autoencoders cross-modal-alignment statistical-framework

Summary

This paper investigates the Platonic Representation Hypothesis, proposing that alignment arises from linear structure in representations, and introduces a statistical framework of signal, bias, and noise.

arXiv:2605.28870v1 Announce Type: new Abstract: We investigate the Platonic Representation Hypothesis (PRH) through a tripartite statistical framework of representations: signal, bias, and noise. {1) Signal:} We propose that Platonic alignment arises from the universal relationship between objects and attributes, which is encoded linearly in representations according to the Linear Representation Hypothesis (LRH). We provide evidence that LRH helps explain PRH by extracting linear object-attribute features with sparse autoencoders and showing that these sparse representations often exhibit stronger cross-modal alignment than their dense counterparts. {2) Bias:} Models have different implicit biases due to the diverse architectures and training procedures used. We show that this difference can be partially mitigated. Centering and normalization consistently improve cross-model alignment. {3) Noise:} Finite-sample training leads to noise in representations. We provide evidence that representational noise is driven by data scarcity by revealing a strong and consistent positive correlation between word frequency and alignment in LLMs and text embedding models. Synthesizing signal, bias, and noise, we propose a statistical model that refines the Linear Representation Hypothesis and explains further phenomena related to the alignment of representations emerging from diverse modern AI architectures.

Original Article

View Cached Full Text

Cached at: 05/29/26, 09:12 AM

# Representation Alignment Rests on Linear Structure
Source: [https://arxiv.org/abs/2605.28870](https://arxiv.org/abs/2605.28870)
[View PDF](https://arxiv.org/pdf/2605.28870)

> Abstract:We investigate the Platonic Representation Hypothesis \(PRH\) through a tripartite statistical framework of representations: signal, bias, and noise\. \{1\) Signal:\} We propose that Platonic alignment arises from the universal relationship between objects and attributes, which is encoded linearly in representations according to the Linear Representation Hypothesis \(LRH\)\. We provide evidence that LRH helps explain PRH by extracting linear object\-attribute features with sparse autoencoders and showing that these sparse representations often exhibit stronger cross\-modal alignment than their dense counterparts\. \{2\) Bias:\} Models have different implicit biases due to the diverse architectures and training procedures used\. We show that this difference can be partially mitigated\. Centering and normalization consistently improve cross\-model alignment\. \{3\) Noise:\} Finite\-sample training leads to noise in representations\. We provide evidence that representational noise is driven by data scarcity by revealing a strong and consistent positive correlation between word frequency and alignment in LLMs and text embedding models\. Synthesizing signal, bias, and noise, we propose a statistical model that refines the Linear Representation Hypothesis and explains further phenomena related to the alignment of representations emerging from diverse modern AI architectures\.

## Submission history

From: Kiril Bangachev \[[view email](https://arxiv.org/show-email/4c96c204/2605.28870)\] **\[v1\]**Fri, 22 May 2026 12:59:01 UTC \(4,726 KB\)

Representation Alignment Rests on Linear Structure

Similar Articles

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry

Mechanistic Analysis of Alignment Algorithms in Language Models

Graph Alignment Topology as an Inductive Bias for Grounding Detection

Submit Feedback

Similar Articles

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry

Mechanistic Analysis of Alignment Algorithms in Language Models

Graph Alignment Topology as an Inductive Bias for Grounding Detection