Representation Alignment Rests on Linear Structure

arXiv cs.LG Papers

Summary

This paper investigates the Platonic Representation Hypothesis, proposing that alignment arises from linear structure in representations, and introduces a statistical framework of signal, bias, and noise.

arXiv:2605.28870v1 Announce Type: new Abstract: We investigate the Platonic Representation Hypothesis (PRH) through a tripartite statistical framework of representations: signal, bias, and noise. {1) Signal:} We propose that Platonic alignment arises from the universal relationship between objects and attributes, which is encoded linearly in representations according to the Linear Representation Hypothesis (LRH). We provide evidence that LRH helps explain PRH by extracting linear object-attribute features with sparse autoencoders and showing that these sparse representations often exhibit stronger cross-modal alignment than their dense counterparts. {2) Bias:} Models have different implicit biases due to the diverse architectures and training procedures used. We show that this difference can be partially mitigated. Centering and normalization consistently improve cross-model alignment. {3) Noise:} Finite-sample training leads to noise in representations. We provide evidence that representational noise is driven by data scarcity by revealing a strong and consistent positive correlation between word frequency and alignment in LLMs and text embedding models. Synthesizing signal, bias, and noise, we propose a statistical model that refines the Linear Representation Hypothesis and explains further phenomena related to the alignment of representations emerging from diverse modern AI architectures.
Original Article
View Cached Full Text

Cached at: 05/29/26, 09:12 AM

# Representation Alignment Rests on Linear Structure
Source: [https://arxiv.org/abs/2605.28870](https://arxiv.org/abs/2605.28870)
[View PDF](https://arxiv.org/pdf/2605.28870)

> Abstract:We investigate the Platonic Representation Hypothesis \(PRH\) through a tripartite statistical framework of representations: signal, bias, and noise\. \{1\) Signal:\} We propose that Platonic alignment arises from the universal relationship between objects and attributes, which is encoded linearly in representations according to the Linear Representation Hypothesis \(LRH\)\. We provide evidence that LRH helps explain PRH by extracting linear object\-attribute features with sparse autoencoders and showing that these sparse representations often exhibit stronger cross\-modal alignment than their dense counterparts\. \{2\) Bias:\} Models have different implicit biases due to the diverse architectures and training procedures used\. We show that this difference can be partially mitigated\. Centering and normalization consistently improve cross\-model alignment\. \{3\) Noise:\} Finite\-sample training leads to noise in representations\. We provide evidence that representational noise is driven by data scarcity by revealing a strong and consistent positive correlation between word frequency and alignment in LLMs and text embedding models\. Synthesizing signal, bias, and noise, we propose a statistical model that refines the Linear Representation Hypothesis and explains further phenomena related to the alignment of representations emerging from diverse modern AI architectures\.

## Submission history

From: Kiril Bangachev \[[view email](https://arxiv.org/show-email/4c96c204/2605.28870)\] **\[v1\]**Fri, 22 May 2026 12:59:01 UTC \(4,726 KB\)

Similar Articles

Mechanistic Analysis of Alignment Algorithms in Language Models

arXiv cs.LG

This paper presents a systematic mechanistic analysis of six preference optimization methods (PPO, DPO, SimPO, ORPO, GRPO, KTO) across three open-weight model families, using probing and sparse autoencoders to reveal how alignment algorithms reshape internal representations in qualitatively distinct ways.

Graph Alignment Topology as an Inductive Bias for Grounding Detection

arXiv cs.CL

This paper introduces Graph Alignment Topology as an inductive bias for grounding detection, using a graph neural network to model alignment structure between reference information and LLM outputs. The method achieves state-of-the-art results on multiple hallucination and question-answering datasets, outperforming GPT-4o.