Learning Coherent Representations: A Topological Approach to Interpretability

arXiv cs.LG 06/03/26, 04:00 AM Papers
Summary
This paper introduces coherence, a geometric constraint for neural representations inspired by grid cells and head direction cells in the brain. Coherence ensures that features respond to geometrically connected regions of the data manifold, improving interpretability; the authors propose a differentiable objective (Coh) and validate it on synthetic data, rotated MNIST, and BERT token embeddings.
arXiv:2606.02841v1 Announce Type: new Abstract: Deep neural networks learn representations where individual features often lack interpretable meaning; a single neuron may activate for scattered, unrelated inputs. We introduce coherence, a geometric property inspired by neural coding in the brain, where neurons like grid cells and head direction cells respond to contiguous regions of state space. A non-negative matrix is coherent if each row (sample) attends to geometrically clustered columns (features) and vice versa, and in addition every sample is well described by some feature and every feature is needed by some sample. We prove that coherent matrices induce a bounded interleaving between the Vietoris-Rips filtrations of samples and features, guaranteeing that both spaces share compatible topological structure. This geometric constraint facilitates interpretability. For example, if data lies on a circle, coherent features must tile that circle into contiguous arcs. We introduce Coh, a differentiable objective function based on Fr\'echet variance that enforces coherence during training. Unlike sparsity, which bounds how many samples a feature activates on, coherence bounds which samples, requiring geometric connectivity rather than only rarity. This yields not just interpretable features but an interpretable feature space. We validate Coh in an auto-encoder using synthetic and rotated MNIST datasets and in a token embedding of BERT using language data.
Original Article
View Cached Full Text
Cached at: 06/03/26, 09:40 AM
# Learning Coherent Representations: A Topological Approach to Interpretability
Source: [https://arxiv.org/html/2606.02841](https://arxiv.org/html/2606.02841)
Melvin VaupelValdemar Kargård OlsenErik HermansenBenjamin A\. Dunn

###### Abstract

Deep neural networks learn representations where individual features often lack interpretable meaning; a single neuron may activate for scattered, unrelated inputs\. We introduce coherence, a geometric property inspired by neural coding in the brain, where neurons like grid cells and head direction cells respond to contiguous regions of state space\. A non\-negative matrix is coherent if each row \(sample\) attends to geometrically clustered columns \(features\) and vice versa, and in addition every sample is well described by some feature and every feature is needed by some sample\. We prove that coherent matrices induce a bounded interleaving between the Vietoris\-Rips filtrations of samples and features, guaranteeing that both spaces share compatible topological structure\. This geometric constraint facilitates interpretability\. For example, if data lies on a circle, coherent features must tile that circle into contiguous arcs\. We introduceCoh, a differentiable objective function based on Fréchet variance that enforces coherence during training\. Unlike sparsity, which bounds how many samples a feature activates on, coherence bounds*which*samples, requiring geometric connectivity rather than only rarity\. This yields not just interpretable features but an interpretable feature space\. We validateCohin an auto\-encoder using synthetic and rotatedMNISTdatasets and in a token embedding ofBERTusing language data\.

## 1Introduction

Deep neural networks \(DNNs\) trained on classification tasks progressively transform data representations such that class manifolds become geometrically separable\(Cohenet al\.,[2020](https://arxiv.org/html/2606.02841#bib.bib12)\)\. This property, that samples from the same class cluster together in latent space, emerges naturally from the classification objective and underlies the success of transfer learning and feature visualization\. However, this says nothing about the features themselves: individual latent dimensions may respond to scattered, incoherent subsets of the data\(Elhageet al\.,[2022](https://arxiv.org/html/2606.02841#bib.bib28)\), limiting interpretability\. For unsupervised approaches such as auto\-encoders, the situation is even worse\. Without class labels to guide separation, networks can learn representations where semantically related samples are spread across the latent space, and where individual features lack any coherent meaning\. Sparsity regularization \(L1L^\{1\}\) encourages features to activate on few samples, but does not ensure those samples are geometrically related, a sparse feature may fire on disconnected regions of the data manifold\. Strikingly, biological neural circuits achieve interpretable representations without explicit supervision\. Grid cells in the entorhinal cortex tile physical space with periodic firing fields\(Haftinget al\.,[2005](https://arxiv.org/html/2606.02841#bib.bib24); Gardneret al\.,[2022](https://arxiv.org/html/2606.02841#bib.bib1)\)\. Head direction cells activate for specific orientations, with each neuron covering a contiguous arc of angles\(Taubeet al\.,[1990](https://arxiv.org/html/2606.02841#bib.bib25); Rybakkenet al\.,[2019](https://arxiv.org/html/2606.02841#bib.bib13)\)\. These neural codes exhibit locality: each neuron’s activity is concentrated on a coherent region of the underlying state space\. This locality is precisely what makes these cells interpretable, one can read off an animal’s position or heading from the concurrent neural activity because the mapping between state and response is geometrically organized\. This observation suggests that locality is not merely a byproduct of evolutionary optimization, but a design principle for interpretable neural codes\. Can we formalize this principle and impose it on artificial neural networks?

We focus primarily on unsupervised settings — autoencoder bottlenecks and transformer token embeddings — where coherence regularization must induce structure without label guidance\. In supervised classifiers, cross\-entropy loss naturally collapses within\-class variation, leaving little structure for coherence to preserve such that the resulting topology is approximately discrete\. Thus, applying the method that we develop in this work to supervised settings where within\-class structure matters is a direction for future work\. Aside from the observations of neural activity in the brain, this work is greatly inspired by the Dowker Duality\(Dowker,[1952](https://arxiv.org/html/2606.02841#bib.bib3)\), a famous duality between the topology of rows and columns of a binary matrix, and we consider our work as a geometric analogue, where we do not get Dowker Duality for free, but rather have to define a class of matrices where the geometric Vietoris\-Rips row and column filtrations, as defined in[AppendixC](https://arxiv.org/html/2606.02841#A3), are similarly interleaved\.

![Refer to caption](https://arxiv.org/html/2606.02841v1/x1.png)

Figure 1:Given an auto\-encoder with a non\-negative activation function, we can treat the encoded latent space as a matrixMM, whose rows are samples and columns are features\. Most often, the*topology*of the samples and features are vastly different\. We regularize these spaces to be topologically similar by creating an explicit interleaving between the filtered simplicial complexes induced by the latent samples and latent features by using the barycentric mapsϕ\\phiandψ\\psi\. Our definition of a matrix to beϵ\\epsilon\-*local*implies that any samplerir\_\{i\}is mappedϵ1/2\\epsilon^\{1/2\}\-close to some featureckc\_\{k\}\. For instance, the position of their barycentric images in the weight space of all rows shows that the columnc1c\_\{1\}is more local than the columnc2c\_\{2\}\. Moreover, our definition ofϵ\\epsilon\-covering implies that around any samplerkr\_\{k\}there is some feature that mapsϵ1/2\\epsilon^\{1/2\}\-close to it under the barycentric map\. When a matrixMM\(latent space\) has both these properties we call itϵ\\epsilon\-coherent, and together with a non\-expanding assumption on the barycentric maps, this implies that the latent samples and latent features are topologicallyϵ1/2\\epsilon^\{1/2\}\-similar\.### 1\.1Contributions

1. 1\.Definition of coherence\.We define a non\-negative matrix to be*ϵ\\epsilon\-coherent*if it is both*ϵ\\epsilon\-local*\(each row attends to a geometrically clustered set of columns, and vice versa\) and an*ϵ\\epsilon\-covering*\(each row is matched by some column’s barycenter, and vice versa\)\. We prove that coherent matrices induce a bounded interleaving between the Vietoris–Rips filtrations of samples and features, so the two spaces share compatible topology\.
2. 2\.Differentiable objective function\.We deriveCoh, a differentiable loss based on Fréchet variance, addable to any architecture with non\-negative activations\. Its two terms penalize the locality and covering quantities from the definition above, driving representations towardϵ\\epsilon\-coherence\.
3. 3\.Empirical validation\.We show thatCohproduces interpretable feature spaces across distinct settings: it recovers the expected topology in synthetic and rotated\-MNISTautoencoders, and in aBERTtoken embedding it yields features aligned with human\-readable categories \(e\.g\., years, kinship terms, and place names, but also units of measurement, hedging adverbs, and directional prepositions\), whereas non\-negativity alone yields essentially none\.

Our work connects topological data analysis, neuroscience, and representation learning, providing both theoretical foundations and a practical objective function for learning interpretable latent spaces\.

### 1\.2Related Work

Interpretability\.

As models scale, understanding learned representations becomes critical for safety and debugging\(Olahet al\.,[2020](https://arxiv.org/html/2606.02841#bib.bib5)\)\. A representation is interpretable if individual features correspond to human\-understandable concepts\.

Cohen et al\.\(Cohenet al\.,[2020](https://arxiv.org/html/2606.02841#bib.bib12)\)showed that classification training progressively untangles class manifolds, making them linearly separable in later layers; Mamou et al\.\(Mamouet al\.,[2020](https://arxiv.org/html/2606.02841#bib.bib21)\)observed similar separation in language models\. These results characterize*sample*geometry but do not address whether individual features are interpretable\. Sparse coding\(Olshausen and Field,[1996](https://arxiv.org/html/2606.02841#bib.bib23)\)and recent sparse auto\-encoders for mechanistic interpretability\(Brickenet al\.,[2023](https://arxiv.org/html/2606.02841#bib.bib19); Cunninghamet al\.,[2024](https://arxiv.org/html/2606.02841#bib.bib22)\)address this by encouraging features to activate rarely, reducing polysemanticity\. However, sparsity constrains*how many*samples a feature activates on, not*which*—a sparse feature may fire on geometrically scattered inputs\. Coherence explicitly requires that active samples be spatially clustered, providing a geometric guarantee that features align with data structure\.

Geometric deep learning\.Geometric deep learning\(Bronsteinet al\.,[2021](https://arxiv.org/html/2606.02841#bib.bib6)\)incorporates known symmetries \(translation, rotation, permutation\) into network architectures, reducing the hypothesis space and improving generalization\. Our approach is complementary: rather than encoding symmetries architecturally, we regularize the learned representation to exhibit geometric structure, specifically requiring that feature and sample spaces share compatible topology\.

Topological auto\-encoders\.Topological data analysis \(TDA\) provides robust tools for characterizing data shape\(Carlsson,[2009](https://arxiv.org/html/2606.02841#bib.bib7)\), with stability results ensuring that small perturbations in data yield small changes in topological descriptors\(Chazalet al\.,[2009](https://arxiv.org/html/2606.02841#bib.bib4)\)\. Several lines of work use tools from TDA to regularize neural networks such as\(Mooret al\.,[2020](https://arxiv.org/html/2606.02841#bib.bib18); Hoferet al\.,[2019](https://arxiv.org/html/2606.02841#bib.bib10); Huet al\.,[2019](https://arxiv.org/html/2606.02841#bib.bib8)\)\. These approaches aim to*preserve*input topology in the latent representation or to give the output space a target topology\. Our goal differs fundamentally: we do not preserve topology but*mirror*it between latent samples and latent features\. Another difference is that we operate at the level of simplicial filtrations, via explicit interleaving maps, avoiding the need to choose a homology degree, which these methods are bound by\.

Neural coding\.Our work draws inspiration from neuroscience, where grid cells\(Haftinget al\.,[2005](https://arxiv.org/html/2606.02841#bib.bib24)\)and head direction cells\(Taubeet al\.,[1990](https://arxiv.org/html/2606.02841#bib.bib25)\)exhibit local receptive fields\. These local fields allow us to treat every row or column of the data matrix as a “point”, revealing the topological space formed by the data, as shown for grid cells\(Gardneret al\.,[2022](https://arxiv.org/html/2606.02841#bib.bib1)\)and head direction cells\(Rybakkenet al\.,[2019](https://arxiv.org/html/2606.02841#bib.bib13)\)\.

Similarity\-preserving networks\.Sengupta et al\.\(Senguptaet al\.,[2018](https://arxiv.org/html/2606.02841#bib.bib9)\)showed that non\-negative similarity\-preserving objectives yield localized receptive fields tiling input manifolds\. Our work differs in requiring bidirectional coherence\. The feature space must share the sample space’s topology\. This provides explicit interleaving guarantees rather than characterizing receptive field shapes\.

## 2Background

We briefly establish notation and point to Appendix[C](https://arxiv.org/html/2606.02841#A3)for full definitions\. Given a finite setPPin a metric space, the*Vietoris\-Rips filtration*\{VRt\(P\)\}t≥0\\\{VR\_\{t\}\(P\)\\\}\_\{t\\geq 0\}is a nested family of simplicial complexes capturing the topology ofPPat increasing scales\. Two filtrations are*δ\\delta\-interleaved*if there exist maps between them that approximately commute with the inclusions, up to aδ\\delta\-shift in scale \(see Definition[C\.6](https://arxiv.org/html/2606.02841#A3.Thmtheorem6)\)\. The*interleaving distance*bounds the bottleneck distance between persistence diagrams, ensuring similar homological features\.

Intuitively, two spaces that areδ\\delta\-interleaved are ’topologicallyδ\\delta\-similar’, i\.e\., they have the same large\-scale shape and differ only at scales belowδ\\delta\.

![Refer to caption](https://arxiv.org/html/2606.02841v1/x2.png)

Figure 2:Coherent vs non\-coherent matrices derived from circular state space\.Left: Coherenceε=0\.18\\varepsilon=0\.18\.Right: Non\-coherentε=1\.46\\varepsilon=1\.46\. In rows one and three, we show PCA projection of rows and columns colored by the activation of a column and row\. In row two and four we show persistence diagrams of rows and columns where we highlight the most persistentH1H\_\{1\}\. Note that only the coherent matrix exhibits matching circular topology in both rows and columns\.
## 3Coherent Matrices

The goal of this section is to introduce the concept of coherent matrices and show how it induces a natural interleaving between metric spaces associated to the rows and columns of the matrixMM\. The pairwise distances between row and column vectors respectively can lie on vastly different scales for non\-square matrices\. In comparing their induced topologies we want to be agnostic to this difference\. For practical purposes we remedy it by normalizing the row and column metrics by scaling by a factor of one over the mean of pairwise distances\. Many of the following results can be done for anyLpL^\{p\}\-norm, but we will stick to the Euclidean norm as this makes barycentric maps linear with a simple, closed well\-defined form that is easy to compute in training\. We note that the choice of the Euclidean norm may suffer in high dimensions, but is a natural choice for our method\.

Throughout this section, we assume we have a non\-negative matrixM∈ℝ\+m×nM\\in\\mathbb\{R\}^\{m\\times n\}\_\{\+\}withno zero rows or zero columns\. We will denote by

ℛ=\{r1,…,rm\}⊂ℝnand𝒞=\{c1,…,cn\}⊂ℝm\\mathcal\{R\}=\\\{r\_\{1\},\\dots,r\_\{m\}\\\}\\subset\\mathbb\{R\}^\{n\}\\quad\\text\{and\}\\quad\\mathcal\{C\}=\\\{c\_\{1\},\\dots,c\_\{n\}\\\}\\subset\\mathbb\{R\}^\{m\}the set of rows and columns ofMM\. We want the non\-negative rows and columns in the matrix to act like probability distributions over the columns and rows respectively, hence we define the notion of normalization kernels\.

###### Definition 3\.1\.

LetσR:ℝ\+n∖\{0\}→Δn−1\\sigma\_\{R\}:\\mathbb\{R\}\_\{\+\}^\{n\}\\setminus\\\{0\\\}\\to\\Delta^\{n\-1\}andσC:ℝm∖\{0\}→Δm−1\\sigma\_\{C\}:\\mathbb\{R\}^\{m\}\\setminus\\\{0\\\}\\to\\Delta^\{m\-1\}be continuous functions that map a non\-zero vector to a probability vector\. We callσR\\sigma\_\{R\}andσC\\sigma\_\{C\}*normalization kernels*\. LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}\. We define the*generalized weight spaces*as follows:

1. 1\.The*row\-weight space*𝒲\\mathcal\{W\}is defined by the matrixWW, where theii\-th row is:Wi,⋅=w\(i\)=σR\(ri\)\.W\_\{i,\\cdot\}=w^\{\(i\)\}=\\sigma\_\{R\}\(r\_\{i\}\)\.
2. 2\.The*column\-weight space*𝒱\\mathcal\{V\}is defined by the matrixVV, where thejj\-th row is:Vj,⋅=v\(j\)=σC\(cj\)\.V\_\{j,\\cdot\}=v^\{\(j\)\}=\\sigma\_\{C\}\(c\_\{j\}\)\.

###### Example 3\.2\.

Examples of normalization kernels areL1L^\{1\}normalization, softmax and squaredL1L^\{1\}normalization, which we use in our experiments\.

We now have a natural choice of maps between the row\-weight space and column\-weight space using the*Fréchet mean*\.

###### Definition 3\.3\.

Let𝒜⊂ℝn\\mathcal\{A\}\\subset\\mathbb\{R\}^\{n\}be a set of vectors\. We denote by𝖼𝗈𝗇𝗏⁡\(𝒜\)\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{A\}\), the*convex hull*of the vectors𝒜\\mathcal\{A\}\.

###### Definition 3\.4\.

LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}and let\{w\(i\)\}i∈\{1,…,m\}\\\{w^\{\(i\)\}\\\}\_\{i\\in\\\{1,\\dots,m\\\}\}and\{v\(j\)\}j∈\{1,…,n\}\\\{v^\{\(j\)\}\\\}\_\{j\\in\\\{1,\\dots,n\\\}\}be row and column weights induced by some normalization kernelsσR\\sigma\_\{R\}andσC\\sigma\_\{C\}\. We have the barycenters

ϕ~:ℛ\\displaystyle\\tilde\{\\phi\}:\\mathcal\{R\}→𝖼𝗈𝗇𝗏⁡\(𝒞\)defined by\\displaystyle\\to\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{C\}\)\\quad\\text\{defined by\}\(1\)ri\\displaystyle\\quad r\_\{i\}↦𝖺𝗋𝗀𝗆𝗂𝗇μ∈𝖼𝗈𝗇𝗏⁡\(𝒞\)∑j=1nwj\(i\)‖μ−cj‖22\\displaystyle\\mapsto\\operatorname\*\{\\mathsf\{argmin\}\}\_\{\\mu\\in\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{C\}\)\}\\sum\_\{j=1\}^\{n\}w^\{\(i\)\}\_\{j\}\\\|\\mu\-c\_\{j\}\\\|\_\{2\}^\{2\}\(2\)and

ψ~:𝒞\\displaystyle\\tilde\{\\psi\}:\\mathcal\{C\}→𝖼𝗈𝗇𝗏⁡\(ℛ\)defined by\\displaystyle\\to\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{R\}\)\\quad\\text\{defined by\}\(3\)cj\\displaystyle\\quad c\_\{j\}↦𝖺𝗋𝗀𝗆𝗂𝗇μ∈𝖼𝗈𝗇𝗏⁡\(ℛ\)∑i=1mvi\(j\)‖μ−ri‖22\.\\displaystyle\\mapsto\\operatorname\*\{\\mathsf\{argmin\}\}\_\{\\mu\\in\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{R\}\)\}\\sum\_\{i=1\}^\{m\}v^\{\(j\)\}\_\{i\}\\\|\\mu\-r\_\{i\}\\\|\_\{2\}^\{2\}\.\(4\)We extend these maps linearly toϕ:𝖼𝗈𝗇𝗏⁡\(ℛ\)→𝖼𝗈𝗇𝗏⁡\(𝒞\)\\phi:\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{R\}\)\\to\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{C\}\)andψ:𝖼𝗈𝗇𝗏⁡\(𝒞\)→𝖼𝗈𝗇𝗏⁡\(ℛ\)\\psi:\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{C\}\)\\to\\operatorname\{\\mathsf\{conv\}\}\(\\mathcal\{R\}\)\.

The barycentric maps give a natural way of going between the convex hull of the rows and the convex hull of the columns\. The next definition,*Fréchet variance*, will capture how stable these maps are\. This allows us to define the notion of anϵ\\epsilon\-local matrix, whereϵ\\epsilonis an upper bound to the variance for all rows and columns:

###### Definition 3\.6\.

LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}\. We define the*Fréchet variance*of a rowrir\_\{i\}ofMMas

Varℛ\(ri\)≔∑j=1nwj\(i\)‖ϕ\(ri\)−cj‖22,\\text\{Var\}\_\{\\mathcal\{R\}\}\(r\_\{i\}\)\\coloneqq\\sum\_\{j=1\}^\{n\}w^\{\(i\)\}\_\{j\}\\\|\\phi\(r\_\{i\}\)\-c\_\{j\}\\\|\_\{2\}^\{2\},and the variance of a columncjc\_\{j\}ofMMas

Var𝒞\(cj\)≔∑i=1mvi\(j\)‖ψ\(cj\)−ri‖22\.\\text\{Var\}\_\{\\mathcal\{C\}\}\(c\_\{j\}\)\\coloneqq\\sum\_\{i=1\}^\{m\}v^\{\(j\)\}\_\{i\}\\\|\\psi\(c\_\{j\}\)\-r\_\{i\}\\\|\_\{2\}^\{2\}\.
We say thatMMis*ϵ\\epsilon\-local*if for any row indexiiand column indexjjwe have that

Varℛ\(ri\)≤ϵandVar𝒞\(cj\)≤ϵ\.\\text\{Var\}\_\{\\mathcal\{R\}\}\(r\_\{i\}\)\\leq\\epsilon\\quad\\text\{and\}\\quad\\text\{Var\}\_\{\\mathcal\{C\}\}\(c\_\{j\}\)\\leq\\epsilon\.

The next thing we can ask is that, given a row \(or column\), does there exist a column \(or row\) that maps close to it under the barycentric map? This corresponds to asking whether, for a given sample, does there exist a feature that describes it well? Or dually, given a feature, does there exist a sample that needs it?

###### Definition 3\.7\.

LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}\. We define the*covering*of a rowrir\_\{i\}ofMMas

Covℛ\(ri\)≔∑j=1nwj\(i\)‖ri−ψ\(cj\)‖22,\\text\{Cov\}\_\{\\mathcal\{R\}\}\(r\_\{i\}\)\\coloneqq\\sum\_\{j=1\}^\{n\}w^\{\(i\)\}\_\{j\}\\\|r\_\{i\}\-\\psi\(c\_\{j\}\)\\\|\_\{2\}^\{2\},and the covering of a columncjc\_\{j\}ofMMas

Cov𝒞\(cj\)≔∑i=1mvi\(j\)‖cj−ϕ\(ri\)‖22\.\\text\{Cov\}\_\{\\mathcal\{C\}\}\(c\_\{j\}\)\\coloneqq\\sum\_\{i=1\}^\{m\}v^\{\(j\)\}\_\{i\}\\\|c\_\{j\}\-\\phi\(r\_\{i\}\)\\\|\_\{2\}^\{2\}\.
We say thatMMis*ϵ\\epsilon\-covered*if for any row indexiiand column indexjjwe have that

Covℛ\(ri\)≤ϵandCov𝒞\(cj\)≤ϵ\.\\text\{Cov\}\_\{\\mathcal\{R\}\}\(r\_\{i\}\)\\leq\\epsilon\\quad\\text\{and\}\\quad\\text\{Cov\}\_\{\\mathcal\{C\}\}\(c\_\{j\}\)\\leq\\epsilon\.

We can now show that coherence implies a bound for the interleaving distance between the rows and columns ofMM\. To have an interleaving, we need a matching between the set of rows and columns, as our barycentric maps map into the convex hull, not onto a particular row or column, we have to define a hard barycentric map, by snapping to the closest row or column\.

###### Definition 3\.8\.

LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}\. We define the*snapping barycentric maps*asΦ:ℛ→𝒞\\Phi:\\mathcal\{R\}\\rightarrow\\mathcal\{C\}by

ri↦𝖺𝗋𝗀𝗆𝗂𝗇cj∈𝒞‖ϕ\(ri\)−cj‖2r\_\{i\}\\mapsto\\operatorname\*\{\\mathsf\{argmin\}\}\_\{c\_\{j\}\\in\\mathcal\{C\}\}\\\|\\phi\(r\_\{i\}\)\-c\_\{j\}\\\|\_\{2\}andΨ:𝒞→ℛ\\Psi:\\mathcal\{C\}\\rightarrow\\mathcal\{R\}

cj↦𝖺𝗋𝗀𝗆𝗂𝗇ri∈ℛ‖ψ\(cj\)−ri‖2\.c\_\{j\}\\mapsto\\operatorname\*\{\\mathsf\{argmin\}\}\_\{r\_\{i\}\\in\\mathcal\{R\}\}\\\|\\psi\(c\_\{j\}\)\-r\_\{i\}\\\|\_\{2\}\.We note that there might be several closest columns and rows, in that case one makes an arbitrary choice\.

Locality of a matrixMMgives a bound on how much the barycentric map and the barycentric snapping map can disagree\. In practice this allows us to work with the differentiable barycentric maps, as we know how much is lost when we pass to the non\-differentiable snapping maps that we use to make the matching\.

###### Proposition 3\.9\.

LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}\. IfMMisϵ\\epsilon\-local, then

‖ϕ\(ri\)−Φ\(ri\)‖2≤ϵ1/2and‖ψ\(cj\)−Ψ\(cj\)‖2≤ϵ1/2\.\\\|\\phi\(r\_\{i\}\)\-\\Phi\(r\_\{i\}\)\\\|\_\{2\}\\leq\\epsilon^\{1/2\}\\quad\\text\{and\}\\quad\\\|\\psi\(c\_\{j\}\)\-\\Psi\(c\_\{j\}\)\\\|\_\{2\}\\leq\\epsilon^\{1/2\}\.

Proof sketch\.The variance bound implies that the weighted columnscjc\_\{j\}concentrate aroundϕ\(ri\)\\phi\(r\_\{i\}\)\. SinceΦ\(ri\)\\Phi\(r\_\{i\}\)is one of these columns, it lies within the variance ball\. Full proof in Appendix[D](https://arxiv.org/html/2606.02841#A4)\.

###### Proposition 3\.10\.

IfMMisϵ\\epsilon\-covered, then for any rowrir\_\{i\}and columncjc\_\{j\}the roundabout trips are bounded byϵ1/2\\epsilon^\{1/2\}:

‖ri−ψ∘ϕ\(ri\)‖2≤ϵ1/2and‖cj−ϕ∘ψ\(cj\)‖2≤ϵ1/2\.\\\|r\_\{i\}\-\\psi\\circ\\phi\(r\_\{i\}\)\\\|\_\{2\}\\leq\\epsilon^\{1/2\}\\quad\\text\{and\}\\quad\\\|c\_\{j\}\-\\phi\\circ\\psi\(c\_\{j\}\)\\\|\_\{2\}\\leq\\epsilon^\{1/2\}\.

Proof sketch\.By linearity,ψ∘ϕ\(ri\)\\psi\\circ\\phi\(r\_\{i\}\)is a weighted average of the barycentersψ\(cj\)\\psi\(c\_\{j\}\)\. The covering bound ensuresrir\_\{i\}is close to eachψ\(cj\)\\psi\(c\_\{j\}\)in a weighted sense\. Together with Jensen’s inequality this implies thatrir\_\{i\}lies withinϵ1/2\\epsilon^\{1/2\}of their weighted average\. Full proof in Appendix[D](https://arxiv.org/html/2606.02841#A4)\.

###### Definition 3\.11\.

LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}\. We sayMMis*ϵ\\epsilon\-coherent*ifMMisϵ\\epsilon\-local andϵ\\epsilon\-covered\.

See[Figure2](https://arxiv.org/html/2606.02841#S2.F2)for an illustration of a coherent matrix versus a non\-coherent matrix\.

###### Theorem 3\.12\.

IfM∈ℝ\+m×nM\\in\\mathbb\{R\}^\{m\\times n\}\_\{\+\}isϵ\\epsilon\-coherent and the barycentric mapsϕ\\phiandψ\\psiare11\-Lipschitz, then the barycentric snapping mapsΦ\\PhiandΨ\\Psiinduce anϵ1/2\\epsilon^\{1/2\}\-interleaving betweenVR\(ℛ,d2\)VR\(\\mathcal\{R\},d\_\{2\}\)andVR\(𝒞,d2\)VR\(\\mathcal\{C\},d\_\{2\}\)\.

Proof sketch\.We verify the four conditions in[C\.9](https://arxiv.org/html/2606.02841#A3.Thmtheorem9)\. We want to use the mapsψ\\psiandϕ\\phibetween the rows and columns, but these land in the convex hulls rather than on actual rows or columns\. Therefore, we must use the hard snapping mapsΨ\\PsiandΦ\\Phiand pay the cost ofϵ1/2\\epsilon^\{1/2\}given by[3\.9](https://arxiv.org/html/2606.02841#S3.Thmtheorem9)and[3\.10](https://arxiv.org/html/2606.02841#S3.Thmtheorem10)\.

## 4Algorithm

See[Appendix A](https://arxiv.org/html/2606.02841#A1)for more details of implementation\.

Algorithm 1CohInput:Non\-negative matrix

M∈ℝ≥0B×LM\\in\\mathbb\{R\}^\{B\\times L\}\_\{\\geq 0\}\(Batch

×\\timesLatent\), top\-

kkparameters

kR,kCk\_\{R\},k\_\{C\}, target variance

τ\\tau
Output:Coherence loss

ℒCoh\\mathcal\{L\}\_\{\\text\{\{Coh\}\}\}
// Compute scale factors \(sampled for efficiency\)

d¯R←MeanPairwiseDist\(\{ri\}\)\\bar\{d\}\_\{R\}\\leftarrow\\text\{MeanPairwiseDist\}\(\\\{r\_\{i\}\\\}\)

d¯C←MeanPairwiseDist\(\{cj\}\)\\bar\{d\}\_\{C\}\\leftarrow\\text\{MeanPairwiseDist\}\(\\\{c\_\{j\}\\\}\)

// Compute weights \(Squared L1 normalization\)

Wij←Mij2/∑kMik2W\_\{ij\}\\leftarrow M\_\{ij\}^\{2\}/\\sum\_\{k\}M\_\{ik\}^\{2\}// Row weights

Vji←Mij2/∑kMkj2V\_\{ji\}\\leftarrow M\_\{ij\}^\{2\}/\\sum\_\{k\}M\_\{kj\}^\{2\}// Column weights

// Compute barycenters

ϕ\(ri\)←Wi,:MT\\phi\(r\_\{i\}\)\\leftarrow W\_\{i,:\}M^\{T\}for all

ii
ψ\(cj\)←Vj,:M\\psi\(c\_\{j\}\)\\leftarrow V\_\{j,:\}Mfor all

jj
// Row variance and covering \(normalized\)

for

i=1i=1to

BBdo

VarR\(ri\)←∑jWij‖cj−ϕ\(ri\)‖2/d¯C2\\text\{Var\}\_\{R\}\(r\_\{i\}\)\\leftarrow\\sum\_\{j\}W\_\{ij\}\\\|c\_\{j\}\-\\phi\(r\_\{i\}\)\\\|^\{2\}/\\bar\{d\}\_\{C\}^\{2\}

CovR\(ri\)←∑jWij‖ri−ψ\(cj\)‖2/d¯R2\\text\{Cov\}\_\{R\}\(r\_\{i\}\)\\leftarrow\\sum\_\{j\}W\_\{ij\}\\\|r\_\{i\}\-\\psi\(c\_\{j\}\)\\\|^\{2\}/\\bar\{d\}\_\{R\}^\{2\}

endfor

// Column variance and covered \(normalized\)

for

j=1j=1to

LLdo

VarC\(cj\)←\(∑iVji‖ri−ψ\(cj\)‖2\)/d¯R2\\text\{Var\}\_\{C\}\(c\_\{j\}\)\\leftarrow\\left\(\\sum\_\{i\}V\_\{ji\}\\\|r\_\{i\}\-\\psi\(c\_\{j\}\)\\\|^\{2\}\\right\)/\\bar\{d\}\_\{R\}^\{2\}

CovC\(cj\)←\(∑iVji‖cj−ϕ\(ri\)‖2\)/d¯C2\\text\{Cov\}\_\{C\}\(c\_\{j\}\)\\leftarrow\\left\(\\sum\_\{i\}V\_\{ji\}\\\|c\_\{j\}\-\\phi\(r\_\{i\}\)\\\|^\{2\}\\right\)/\\bar\{d\}\_\{C\}^\{2\}

endfor

// Top\-k aggregation with threshold

ℒvar←TopK\(\[VarR−τ\]\+,kR\)\+TopK\(\[VarC−τ\]\+,kC\)\\mathcal\{L\}\_\{\\text\{var\}\}\\leftarrow\\text\{TopK\}\(\[\\text\{Var\}\_\{R\}\-\\tau\]\_\{\+\},k\_\{R\}\)\+\\text\{TopK\}\(\[\\text\{Var\}\_\{C\}\-\\tau\]\_\{\+\},k\_\{C\}\)

ℒcov←TopK\(\[CovR−τ\]\+,kR\)\+TopK\(\[CovC−τ\]\+,kC\)\\mathcal\{L\}\_\{\\text\{cov\}\}\\leftarrow\\text\{TopK\}\(\[\\text\{Cov\}\_\{R\}\-\\tau\]\_\{\+\},k\_\{R\}\)\+\\text\{TopK\}\(\[\\text\{Cov\}\_\{C\}\-\\tau\]\_\{\+\},k\_\{C\}\)

return

ℒvar\+ℒcov\\mathcal\{L\}\_\{\\text\{var\}\}\+\\mathcal\{L\}\_\{\\text\{cov\}\}

## 5Auto\-Encoder Experiments

We evaluateCohon synthetic and real datasets with known topological structure, measuring whether learned features align with interpretable data attributes\.

We compare with the plain auto\-encoder without objective function and withL1L^\{1\}\-regularization as the most canonical and simple way of getting interpretability\. We use Ripser\(Bauer,[2021](https://arxiv.org/html/2606.02841#bib.bib27)\)for persistent homology and UMAP\(McInneset al\.,[2018](https://arxiv.org/html/2606.02841#bib.bib26)\)for visualization\.

#### Datasets

For the toy data set we sample20,00020\{,\}000points from the disjoint union of two circles embedded inℝ512\\mathbb\{R\}^\{512\}\. We train on half of the samples and create latent spaces from the other half\. We use rotatedMNIST, sampling each digit at7272uniformly spaced angles with250250samples per angle\. We use90%90\\%of the data for training and do all analysis on the latent space given by the latent test samples\. In the single digit experiment we are looking at the digit66as it should be easy to have a circular latent space\. For the two digit experiments, we pick33and77as those are dissimilar and without non\-trivial symmetries under rotation\.

#### Hyperparameters\.

We selectλCoh=10−3\\lambda\_\{\\text\{\{Coh\}\}\}=10^\{\-3\}andλL1=10−3\\lambda\_\{L^\{1\}\}=10^\{\-3\}for bothMNISTexperiments, balancing reconstruction loss with regularization strength\. For the toy example, we setλCoh=10−5\\lambda\_\{\\text\{\{Coh\}\}\}=10^\{\-5\}andλL1=10−4\\lambda\_\{L^\{1\}\}=10^\{\-4\}without systematic tuning\. See[Table4](https://arxiv.org/html/2606.02841#A1.T4)for parameter sweeps\.

### 5\.1Metrics for Interpretability

To evaluate whether learned features are interpretable, we measure how well each feature’s activation aligns with known structure in the data\. We introduce three metrics that yield directly human\-readable feature descriptions in experiments\.

#### Component score\.

For data withKKdiscrete components/labels, we measure whether features concentrate on a single component/label:

𝖼𝗈𝗆𝗉𝗌𝖼𝗈𝗋𝖾⁡\(c\)≔KK−1\(maxk∑i∈componentkci∑ici−1K\)\.\\operatorname\{\\mathsf\{compscore\}\}\(c\)\\coloneqq\\frac\{K\}\{K\-1\}\\left\(\\frac\{\\max\_\{k\}\\sum\_\{i\\in\\text\{component \}k\}c\_\{i\}\}\{\\sum\_\{i\}c\_\{i\}\}\-\\frac\{1\}\{K\}\\right\)\.Scores range from0\(uniform\) to11\(within one single component\)\. We report the fraction of features with𝖼𝗈𝗆𝗉𝗌𝖼𝗈𝗋𝖾\>0\.5\\operatorname\{\\mathsf\{compscore\}\}\>0\.5\. \(Pure\)

#### MRL\.

When data has circular structure \(e\.g\., rotations\), an interpretable feature should activate on a coherent arc\. We use the Mean Resultant Length \(MRL\):

𝖬𝖱𝖫⁡\(c\)≔‖∑icieiθi‖∑ici,\\operatorname\{\\mathsf\{MRL\}\}\(c\)\\coloneqq\\frac\{\\left\\\|\\sum\_\{i\}c\_\{i\}e^\{i\\theta\_\{i\}\}\\right\\\|\}\{\\sum\_\{i\}c\_\{i\}\},whereθi\\theta\_\{i\}is the angle associated with sampleii\. A score of11indicates all activation at a single angle; a score near0indicates activation spread uniformly around the circle\. We report the fraction of features withMRL\>0\.5\\text\{MRL\}\>0\.5, indicating meaningful angular selectivity\. \(Tuned\)

#### MRL180\.

Digits 3, 6 and 7 lack180180degree symmetry, yet networks often encode antipodal angles together\. We compute MRL with doubled angles:

𝖬𝖱𝖫180⁡\(c\)≔‖∑icie2iθi‖∑ici,\\operatorname\{\\mathsf\{MRL\}\}\_\{180\}\(c\)\\coloneqq\\frac\{\\left\\\|\\sum\_\{i\}c\_\{i\}e^\{2i\\theta\_\{i\}\}\\right\\\|\}\{\\sum\_\{i\}c\_\{i\}\},so features firing atθ\\thetaandθ\+π\\theta\+\\pireceive high scores rather than canceling\. We report the fraction of features withMRL180\>0\.5\\text\{MRL\}\_\{180\}\>0\.5, indicating meaningful angular selectivity\. \(Tuned180\\text\{Tuned\}\_\{180\}\)

In addition to these metrics we keep track of how sparse the features are\. We say a feature is considered active if its activation exceeds1%1\\%of the maximum activation\. We record the mean percentage of the active feature per sample\.

#### Toy experiment

The learned barycentric maps in theCohmodel satisfy the 1\-Lipschitz assumption \(checked by sampling1000×2561000\\times 256pairs\), yielding a\(0\.14\)1/2\(0\.14\)^\{1/2\}\-interleaving by[Theorem3\.12](https://arxiv.org/html/2606.02841#S3.Thmtheorem12)\. See[Figure3](https://arxiv.org/html/2606.02841#S5.F3)for a visualization of the resulting latent spaces and[Table1](https://arxiv.org/html/2606.02841#S5.T1)for the results from the experiment\. All three models encode a seemingly similar latent sample space\. When looking at the latent features, the story is different, and it is not clear how the feature space relates to the original data except for ourCohmodel\. See[Figure 5](https://arxiv.org/html/2606.02841#A1.F5)and[Figure 6](https://arxiv.org/html/2606.02841#A1.F6)for similar results on other spaces\.

![Refer to caption](https://arxiv.org/html/2606.02841v1/x3.png)

Figure 3:Two Circles Toy experiment\.UMAP projections of latent samples and latent features , with persistence diagrams for each\. We highlight the two most persistentH1H\_\{1\}features, representing the two circles\. Samples are colored by activation of a single feature; features are colored by activation on a single sample\. Only theCohmodel gives the expected disjoint circular topology in both spaces\.Table 1:Two Circles Toy Experiment\.We showcase the results from a single run\.Cohachieves 100% tuned features and 90% purity withε=0\.14\\varepsilon=0\.14coherence\.
#### Single digit experiment

With a single rotated class \(6\) of digits we observe in[Table2](https://arxiv.org/html/2606.02841#S5.T2)thatCohhas all features being tuned to the angles of the data, while being much less sparse than the features forL1L^\{1\}\. There also is very little cost to this improvement in terms of MSE\. In[Figure4](https://arxiv.org/html/2606.02841#S5.F4)we see the duality of the circular space and in[Figure9](https://arxiv.org/html/2606.02841#A1.F9)we observe that features and samples define activation on each others space\. The average samples plotted over a feature clearly shows us the rotated expected digits\. In[Figure10](https://arxiv.org/html/2606.02841#A1.F10)we decode the persistentH1H\_\{1\}’s against the true angles confirming that the found circles are given by the angles generating the data\. We justify the11\-Lipschitz assumption by sampling1000×2561000\\times 256pairs forϕ\\phiandψ\\psiacross five seeds on theCohlatent space, here we find thatψ\\psisatisfies the condition nearly exactly \(violation rate<0\.002%<0\.002\\%\) andϕ\\phiviolates it on4\.2±0\.9%4\.2\\pm 0\.9\\%of pairs with mean expansion1\.0451\.045, hence the interleaving is almost a0\.151/20\.15^\{1/2\}\-interleaving\. The high values forMRL180\\text\{MRL\}\_\{180\}in theCohlatent space are due to wide fields\.

Table 2:Single Digit ExperimentResults averaged over 5 random seeds\.Cohachieves 100% tuned features \(MRL\>0\.5\>0\.5\) compared to63%63\\%for L1, with lower variance across seeds\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x4.png)

Figure 4:Single Digit Experiment\.For a representative seed we show the UMAP projections of latent samples and latent features, with persistence diagrams for each\. We highlight the single most persistentH1H\_\{1\}features, representing the expected circle\. Samples are colored by activation of a single feature and features are colored by activation on a single sample\. Only theCohmodel gives the expected circular topology in both spaces\.
#### Double digits experiment

This setting is more challenging as the latent space must represent two classes×72\\times 72angles\. Our model shows high variance across seeds \([Table5](https://arxiv.org/html/2606.02841#A1.T5)\) due to two qualitatively different solutions\. Crucially,Cohachieves what it is designed to do\. The coherence metrics \(Loc and Cov\) are stable across all seeds with negligible variance\. The algorithm reliably produces coherent latent spaces\. The high variance in MRL and Purity does not reflect failure of the method\. It reflects the fact that coherence admits multiple valid solutions\. From[Table5](https://arxiv.org/html/2606.02841#A1.T5)one can see thatCohbeats the other models in all interpretability metrics with the exception of MRL and%\\%TUN score whereL1L^\{1\}\-sparsity wins\. This is most likely due to sparseness\. The high standard deviation for MRL,𝖬𝖱𝖫180\\operatorname\{\\mathsf\{MRL\}\}\_\{180\}and Purity comes from the existence of two types of solutions\. In some seeds the two digit classes stay separated\. In others they merge into a single circle, losing labeling information but with excellent angular tuning\. Both solutions are coherent\. Both have interpretable features\. They simply organize the data differently\. In[Figure7](https://arxiv.org/html/2606.02841#A1.F7),[Figure12](https://arxiv.org/html/2606.02841#A1.F12)and[Figure11](https://arxiv.org/html/2606.02841#A1.F11)we plot the UMAP embeddings, persistence diagrams and average samples over features for two of these different solutions\. In[Figure7](https://arxiv.org/html/2606.02841#A1.F7)one should pay attention to the persistentH0H\_\{0\}feature being present in the top latent space but not the bottom\. This confirms the visual difference from UMAP\. The top solution embeds the two digits in separate components \([Figure15](https://arxiv.org/html/2606.02841#A1.F15)\) with angles looped around twice\. The bottom solution merges the two circles \([Figure16](https://arxiv.org/html/2606.02841#A1.F16)\) with no double loop\. Both latent spaces are coherent as the UMAP projections and persistence diagrams confirm\. The features in both cases attend to localized regions of the data manifold as intended\.

To guideCohtoward a specific coherent solution, we combine it withL1L^\{1\}sparsity \(λL1=2×10−2\\lambda\_\{L^\{1\}\}=2\\times 10^\{\-2\}\) andλCoh=10−3\\lambda\_\{\\text\{\{Coh\}\}\}=10^\{\-3\}\. This biases optimization toward a disentangled solution, being the sparsest coherent representation, yielding reliable class separation and angle tuning across seeds while preserving coherence guarantees \(see[Figure8](https://arxiv.org/html/2606.02841#A1.F8),[Figure13](https://arxiv.org/html/2606.02841#A1.F13)and[Figure14](https://arxiv.org/html/2606.02841#A1.F14)\)\.

We again justify the 1\-Lipschitz assumption by sampling1000×2561000\\times 256pairs across ten seeds:ψ\\psisatisfies the condition nearly exactly \(violation rate<0\.2%<0\.2\\%in9/109/10seeds and last seed yields1\.46%1\.46\\%\);ϕ\\phiviolates it on3\.4±2\.03\.4\\pm 2\.0% of pairs with mean expansion1\.054±0\.0081\.054\\pm 0\.008\.

## 6Token Embedding Experiment

In Large Language Models, words, parts of words and symbols are represented as token embeddings, that are learned during training\. While these embeddings are typically signed, they can be made non\-negative for the sake of interpretability\. This is achieved through a non\-negative activation function such asSoftplusand guarantees a non\-negative matrixMM\. It is well known that the token embedding space carries a meaningful geometry: e\.g\., “mother” is close to “father” and “March” is close to “February”\. Features, on the other hand, are often poly\-semantic, and their space carries no clear geometry\. We show that applyingCohto the token embeddings yields interpretable features with a meaningful geometry\. We do not claim, or hope, that*all*features become interpretable, as some are used to “merge” concepts\. To demonstrate that the method works in this setting, we train a smallBERTmodel\(Devlinet al\.,[2019](https://arxiv.org/html/2606.02841#bib.bib29)\)on theWikiText\-2dataset\(Merityet al\.,[2017](https://arxiv.org/html/2606.02841#bib.bib30)\)\.

#### Model\.

We use a compactBERT\-style encoder with a token embedding of dimension256256,22transformer blocks, and44transformer heads, trained on word\-levelWikiText\-2with a vocabulary capped at the2,0002\{,\}000most frequent tokens and a sequence length of128128\. Inputs are tokenized at the word level \(lowercased, with punctuation split off as separate tokens\) rather than the more common subwords\. This makes it easier to tell whether a feature is interpretable\. Training follows the standard masked\-language\-modeling objective with a15%15\\%masking rate and a label smoothing of0\.10\.1, optimized with Adam using a learning rate of3⋅10−43\\cdot 10^\{\-4\}and a batch size of128128\. We train for300300epochs under a constant learning rate\. The only non\-standard choice is the token embedding, which we parametrize as a non\-negative matrix usingSoftpluswithβ=20\\beta=20\.

We train three models with identical hyperparameters: one with the Vanilla \(signed\) token embedding, one withSoftplus, and one withSoftplusandCoh\. See[Appendix B](https://arxiv.org/html/2606.02841#A2)for further details on theCohmodel\.

#### Scoring interpretability and results\.

We evaluate feature interpretability using three complementary methods\. As a baseline, we treat the token geometry of the Vanilla model’s embeddings as ground truth, since all models exhibit well\-structured token geometry\.

For each feature, we examine its top2020activating tokens and identify, within the Vanilla embedding, the token whose2020nearest neighbors best match that activation set; we record this asMean Overlap\. Since overlap varies considerably across features, we additionally report the number of features achieving greater than50%50\\%overlap, which we record asOverlap\>\.5\>\.5\. See[Figure 17](https://arxiv.org/html/2606.02841#A2.F17)and[Figure 18](https://arxiv.org/html/2606.02841#A2.F18)for results\.

We feed the top1010activating tokens per feature into Claude Opus 4\.7\(Anthropic,[2026](https://arxiv.org/html/2606.02841#bib.bib42)\), prompting it to make a binary judgment of whether the feature is interpretable, i\.e\., whether all1010tokens share a common category; we record this asClaude scoring\. SeeLABEL:tab:interpdict1for all interpretable features from the first seed and an explanation from Claude\.

See[Table 3](https://arxiv.org/html/2606.02841#S6.T3)for a brief summary and[Appendix B](https://arxiv.org/html/2606.02841#A2)for more in\-depth results\. Model performance is negligibly impacted in our experiments; see[Table 7](https://arxiv.org/html/2606.02841#A2.T7)\.

Table 3:Mean and standard deviation for interpretability scores across55seeds\.As with the auto\-encoders, coherence yields not only more interpretable features but a more interpretable feature*space*; see[Figure 19](https://arxiv.org/html/2606.02841#A2.F19)and[Figure 20](https://arxiv.org/html/2606.02841#A2.F20)\.

## 7Discussion

Many approaches to interpretability focus on individual features\. We instead focus on the interpretability of the feature space as a whole\. We have given a rigorous framework for coupling latent samples and latent features so that the two share compatible topology, and we have shown that our objective reliably produces coherent representations: locality and covering are stable across seeds, and features inherit meaning from the samples they attend to\.

The two settings validate this in complementary ways\. In the autoencoders, where the data has known topology, coherence lets us verify the geometric guarantee directly, recovering the expected circles in both the sample and feature spaces\. In the token embeddings no ground\-truth topology exists; coherence instead transfers the embedding’s known semantic geometry to the feature space\. That theSoftplus\-only baseline yields almost no interpretable features \(on average87\.687\.6of256256forCohversus none\) shows that non\-negativity alone is not enough, and the full feature listing lets the reader audit every judgment directly\.

Our objective is simple and only requires a non\-negative activation function, yet it carries across architecturally distinct setups\. As a limitation, the Lipschitz assumption on the barycentric maps is verified empirically rather than enforced\. A further observation, from the Double Digit Experiment, is that the same task can admit quite different coherent latent spaces, and our method does not control which one results\. AsCoh\+L1\+\\,L^\{1\}demonstrates, coherence is complementary to other objectives: sparsity can select among coherent solutions without sacrificing the geometric guarantee\.

For future work, we would like to scale to larger networks and datasets, apply coherence to tasks such as classification and to multiple layers rather than only the bottleneck or embedding layer, and explore disentanglement by applying the objective on blocks of the latent space or across multi\-head architectures\.

## Acknowledgments

We want to thank the anonymous reviewers, Chad Giusti and Daniela Egas Santander for constructive feedback\. The work was supported by a grant from the Research Council of Norway \(iMOD, NFR grant 325114\); a Centre of Excellence grant \(Centre for Algorithms in the Cortex, grant 332640\) from the Research Council of Norway; and the Department of Mathematics at NTNU\.

## Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning, specifically interpretability of learned representations\. We do not foresee direct negative societal consequences from this foundational research\. If anything, improved interpretability may contribute positively to AI safety and transparency\.

## References

- Anthropic \(2026\)Claude opus 4\.7\.Note:Large language modelAccessed via claude\.ai, May 2026External Links:[Link](https://www.anthropic.com/claude)Cited by:[§6](https://arxiv.org/html/2606.02841#S6.SS0.SSS0.Px2.p3.2)\.
- U\. Bauer \(2021\)Ripser: efficient computation of Vietoris–Rips persistence barcodes\.Journal of Applied and Computational Topology5\(3\),pp\. 391–423\.External Links:[Document](https://dx.doi.org/10.1007/s41468-021-00071-5)Cited by:[§5](https://arxiv.org/html/2606.02841#S5.p2.1)\.
- A\. Björner \(1995\)Topological methods\.InHandbook of Combinatorics,R\. Graham, M\. Grötschel, and L\. Lovász \(Eds\.\),Vol\.2,pp\. 1819–1872\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p3.1)\.
- T\. Bricken, A\. Templeton, J\. Batson, B\. Chen, A\. Jermyn, T\. Conerly, N\. Turner, C\. Anil, C\. Denison, A\. Askell, R\. Lasenby, Y\. Wu, S\. Kravec, N\. Schiefer, T\. Maxwell, N\. Joseph, Z\. Hatfield\-Dodds, A\. Tamkin, K\. Nguyen, B\. McLean, J\. E\. Burke, T\. Hume, S\. Carter, T\. Henighan, and C\. Olah \(2023\)Towards monosemanticity: decomposing language models with dictionary learning\.Transformer Circuits Thread\.Note:https://transformer\-circuits\.pub/2023/monosemantic\-features/index\.htmlCited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p3.1)\.
- M\. M\. Bronstein, J\. Bruna, T\. Cohen, and P\. Velickovic \(2021\)Geometric deep learning: grids, groups, graphs, geodesics, and gauges\.CoRRabs/2104\.13478\.External Links:[Link](https://arxiv.org/abs/2104.13478),2104\.13478Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p4.1)\.
- M\. Brun and D\. Grinberg \(2024\)The Dowker theorem via discrete Morse theory\.External Links:2407\.15454Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p3.1)\.
- M\. Brun and L\. Salbu \(2023\)The rectangle complex of a relation\.Mediterranean Journal of Mathematics20\(7\)\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p3.1)\.
- G\. E\. Carlsson \(2009\)Topology and data\.Bulletin of the American Mathematical Society46\(2\),pp\. 255–308\.External Links:[Document](https://dx.doi.org/10.1090/S0273-0979-09-01249-X),[Link](https://doi.org/10.1090/S0273-0979-09-01249-X)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p5.1)\.
- F\. Chazal, D\. Cohen\-Steiner, M\. Glisse, L\. Guibas, and S\. Y\. Oudot \(2009\)Proximity of persistence modules and their diagrams\.InProceedings of the 25th Annual Symposium on Computational Geometry,SoCG ’09,pp\. 237–246\.Note:hal\-02292996External Links:[Document](https://dx.doi.org/10.1145/1542362.1542407)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p5.1)\.
- F\. Chazal, V\. de Silva, and S\. Oudot \(2014\)Persistence stability for geometric complexes\.Geometriae Dedicata173,pp\. 193–214\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p9.1)\.
- S\. Chowdhury and F\. Mémoli \(2018\)A functorial Dowker theorem and persistent homology of asymmetric networks\.Journal of Applied and Computational Topology2\(1\),pp\. 115–175\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p5.2)\.
- U\. Cohen, S\. Chung, D\. D\. Lee, and H\. Sompolinsky \(2020\)Separability and geometry of object manifolds in deep neural networks\.Nature Communications11\(1\),pp\. 746\.External Links:[Document](https://dx.doi.org/10.1038/s41467-020-14578-5)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p3.1),[§1](https://arxiv.org/html/2606.02841#S1.p1.1)\.
- H\. Cunningham, A\. Ewart, L\. Riggs, R\. Huben, and L\. Sharkey \(2024\)Sparse autoencoders find highly interpretable features in language models\.InProceedings of the 12th International Conference on Learning Representations \(ICLR\),Note:arXiv:2309\.08600Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p3.1)\.
- V\. de Silva, D\. Morozov, and M\. Vejdemo\-Johansson \(2011\)Persistent cohomology and circular coordinates\.Discrete & Computational Geometry45\(4\),pp\. 737–759\.External Links:[Document](https://dx.doi.org/10.1007/s00454-011-9344-x),[Link](https://doi.org/10.1007/s00454-011-9344-x)Cited by:[Appendix A](https://arxiv.org/html/2606.02841#A1.SS0.SSS0.Px7.p1.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: pre\-training of deep bidirectional transformers for language understanding\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Minneapolis, Minnesota,pp\. 4171–4186\.External Links:[Link](https://aclanthology.org/N19-1423/),[Document](https://dx.doi.org/10.18653/v1/N19-1423)Cited by:[§6](https://arxiv.org/html/2606.02841#S6.p1.1)\.
- C\. H\. Dowker \(1952\)Homology groups of relations\.Annals of Mathematics56\(1\),pp\. 84–95\.External Links:ISSN 0003486X, 19398980,[Link](http://www.jstor.org/stable/1969768)Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p2.6),[§1](https://arxiv.org/html/2606.02841#S1.p2.1)\.
- D\. Dugger and D\. C\. Isaksen \(2004\)Topological hypercovers and𝔸1\\mathbb\{A\}^\{1\}\-realizations\.Mathematische Zeitschrift246\(4\),pp\. 667–689\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p10.3)\.
- N\. Elhage, T\. Hume, C\. Olsson, N\. Schiefer, T\. Henighan, S\. Kravec, Z\. Hatfield\-Dodds, R\. Lasenby, D\. Drain, C\. Chen,et al\.\(2022\)Toy models of superposition\.arXiv preprint arXiv:2209\.10652\.Cited by:[§1](https://arxiv.org/html/2606.02841#S1.p1.1)\.
- R\. J\. Gardner, E\. Hermansen, M\. Pachitariu, Y\. Burak, N\. A\. Baas, B\. A\. Dunn, M\. Moser, and E\. I\. Moser \(2022\)Toroidal topology of population activity in grid cells\.Nature602\(7895\),pp\. 123–128\.External Links:[Document](https://dx.doi.org/10.1038/s41586-021-04268-7)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p6.1),[§1](https://arxiv.org/html/2606.02841#S1.p1.1)\.
- T\. Hafting, M\. Fyhn, S\. Molden, M\. Moser, and E\. I\. Moser \(2005\)Microstructure of a spatial map in the entorhinal cortex\.Nature436\(7052\),pp\. 801–806\.External Links:[Document](https://dx.doi.org/10.1038/nature03721)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p6.1),[§1](https://arxiv.org/html/2606.02841#S1.p1.1)\.
- C\. Hofer, R\. Kwitt, M\. Niethammer, and M\. Dixit \(2019\)Connectivity\-optimized representation learning via persistent homology\.InProceedings of the 36th International Conference on Machine Learning,K\. Chaudhuri and R\. Salakhutdinov \(Eds\.\),Proceedings of Machine Learning Research, Vol\.97,pp\. 2751–2760\.External Links:[Link](https://proceedings.mlr.press/v97/hofer19a.html)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p5.1)\.
- X\. Hu, F\. Li, D\. Samaras, and C\. Chen \(2019\)Topology\-preserving deep image segmentation\.InAdvances in Neural Information Processing Systems,H\. Wallach, H\. Larochelle, A\. Beygelzimer, F\. d'Alché\-Buc, E\. Fox, and R\. Garnett \(Eds\.\),Vol\.32,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2019/file/2d95666e2649fcfc6e3af75e09f5adb9-Paper.pdf)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p5.1)\.
- J\. Mamou, H\. Le,et al\.\(2020\)Emergence of separable manifolds in deep language representations\.InICML,Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p3.1)\.
- L\. McInnes, J\. Healy, N\. Saul, and L\. Großberger \(2018\)UMAP: uniform manifold approximation and projection\.Journal of Open Source Software3\(29\),pp\. 861\.External Links:[Document](https://dx.doi.org/10.21105/joss.00861),[Link](https://doi.org/10.21105/joss.00861)Cited by:[§5](https://arxiv.org/html/2606.02841#S5.p2.1)\.
- S\. Merity, C\. Xiong, J\. Bradbury, and R\. Socher \(2017\)Pointer sentinel mixture models\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=Byj72udxe)Cited by:[§6](https://arxiv.org/html/2606.02841#S6.p1.1)\.
- M\. Moor, M\. Horn, B\. Rieck, and K\. Borgwardt \(2020\)Topological autoencoders\.InICML,Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p5.1)\.
- C\. Olah, N\. Cammarata, L\. Schubert, G\. Goh, M\. Petrov, and S\. Carter \(2020\)Zoom in: an introduction to circuits\.Distill\.Note:https://distill\.pub/2020/circuits/zoom\-inExternal Links:[Document](https://dx.doi.org/10.23915/distill.00024.001)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p2.1)\.
- B\. A\. Olshausen and D\. J\. Field \(1996\)Emergence of simple\-cell receptive field properties by learning a sparse code for natural images\.Nature381\(6583\),pp\. 607–609\.External Links:[Document](https://dx.doi.org/10.1038/381607a0)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p3.1)\.
- J\. A\. Perea, L\. Scoccola, and C\. J\. Tralie \(2023\)DREiMac: dimensionality reduction with eilenberg\-maclane coordinates\.Journal of Open Source Software8\(91\),pp\. 5791\.Cited by:[Appendix A](https://arxiv.org/html/2606.02841#A1.SS0.SSS0.Px7.p1.1)\.
- M\. Robinson \(2022\)Cosheaf representations of relations and Dowker complexes\.Journal of Applied and Computational Topology6\(1\),pp\. 27–63\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p5.2)\.
- E\. Rybakken, N\. Baas, and B\. Dunn \(2019\)Decoding of neural data using cohomological feature extraction\.Neural Computation31\(1\),pp\. 68–93\.External Links:[Document](https://dx.doi.org/10.1162/neco%5Fa%5F01150)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p6.1),[§1](https://arxiv.org/html/2606.02841#S1.p1.1)\.
- G\. Segal \(1968\)Classifying spaces and spectral sequences\.Publications Mathématiques de l’IHÉS34,pp\. 105–112\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p10.3)\.
- A\. Sengupta, C\. Pehlevan, M\. Tepper, A\. Genkin, and D\. Chklovskii \(2018\)Manifold\-tiling localized receptive fields are optimal in similarity\-preserving neural networks\.InAdvances in Neural Information Processing Systems,S\. Bengio, H\. Wallach, H\. Larochelle, K\. Grauman, N\. Cesa\-Bianchi, and R\. Garnett \(Eds\.\),Vol\.31,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2018/file/ee14c41e92ec5c97b54cf9b74e25bd99-Paper.pdf)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p7.1)\.
- E\. H\. Spanier \(1994\)Algebraic topology\.1 edition,Springer,New York, NY\.Note:Originally published by McGraw\-Hill, 1966External Links:ISBN 978\-0\-387\-94426\-5,[Document](https://dx.doi.org/10.1007/978-1-4684-9322-1)Cited by:[Lemma C\.8](https://arxiv.org/html/2606.02841#A3.Thmtheorem8.p1.1)\.
- J\. S\. Taube, R\. U\. Muller, and J\. B\. R\. Jr\. \(1990\)Head\-direction cells recorded from the postsubiculum in freely moving rats\. I\. Description and quantitative analysis\.Journal of Neuroscience10\(2\),pp\. 420–435\.External Links:[Document](https://dx.doi.org/10.1523/JNEUROSCI.10-02-00420.1990)Cited by:[§1\.2](https://arxiv.org/html/2606.02841#S1.SS2.p6.1),[§1](https://arxiv.org/html/2606.02841#S1.p1.1)\.
- M\. Vaupel and B\. Dunn \(2023\)The bifiltration of a relation and extended Dowker duality\.External Links:2310\.11529Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p10.3)\.
- Ž\. Virk \(2021\)Rips complexes as nerves and a functorial Dowker\-nerve diagram\.Mediterranean Journal of Mathematics18\(58\)\.Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p5.2)\.
- I\. Yoon \(2024\)Dowker duality, profunctors, and spectral sequences\.External Links:2408\.13136Cited by:[Appendix F](https://arxiv.org/html/2606.02841#A6.p3.1),[Appendix F](https://arxiv.org/html/2606.02841#A6.p9.1)\.

## Appendix AAuto Encoder: Additional Plots and Tables\.

#### Algorithm

Algorithm[1](https://arxiv.org/html/2606.02841#alg1)presents the idealized coherence computation\. In practice, several modifications are necessary for stable training:

#### Scale normalization\.

WhenM∈ℝN×LM\\in\\mathbb\{R\}^\{N\\times L\}withN≠LN\\neq L, row and column distances live at different scales\. We normalize by the mean pairwise distance \(approximated by sampling\) within each space:

d¯R=1\(N2\)∑i<j‖ri−rj‖2,d¯C=1\(L2\)∑j<k‖cj−ck‖2\.\\bar\{d\}\_\{R\}=\\frac\{1\}\{\\binom\{N\}\{2\}\}\\sum\_\{i<j\}\\\|r\_\{i\}\-r\_\{j\}\\\|\_\{2\},\\quad\\bar\{d\}\_\{C\}=\\frac\{1\}\{\\binom\{L\}\{2\}\}\\sum\_\{j<k\}\\\|c\_\{j\}\-c\_\{k\}\\\|\_\{2\}\.The normalized variances becomeVarR\(ri\)/d¯C2\\text\{Var\}\_\{R\}\(r\_\{i\}\)/\\bar\{d\}\_\{C\}^\{2\}andVarC\(cj\)/d¯R2\\text\{Var\}\_\{C\}\(c\_\{j\}\)/\\bar\{d\}\_\{R\}^\{2\}and the normalized coverings become𝖢𝗈𝗏R⁡\(ri\)/d¯R2\\operatorname\{\\mathsf\{Cov\}\}\_\{R\}\(r\_\{i\}\)/\\bar\{d\}\_\{R\}^\{2\}and𝖢𝗈𝗏C⁡\(cj\)/d¯C2\\operatorname\{\\mathsf\{Cov\}\}\_\{C\}\(c\_\{j\}\)/\\bar\{d\}\_\{C\}^\{2\}\.

#### Top\-kkaggregation\.

Rather than penalizing mean variance, we penalize thekkworst offenders, making the loss robust to outliers, moreover theϵ\\epsilon\-coherence is based on the worst offender:

ℒvar=1k∑i∈top\-kVarR\(ri\)\+1k∑j∈top\-kVarC\(cj\)\.\\mathcal\{L\}\_\{\\text\{var\}\}=\\frac\{1\}\{k\}\\sum\_\{i\\in\\text\{top\-\}k\}\\text\{Var\}\_\{R\}\(r\_\{i\}\)\+\\frac\{1\}\{k\}\\sum\_\{j\\in\\text\{top\-\}k\}\\text\{Var\}\_\{C\}\(c\_\{j\}\)\.

#### Target variance/covered threshold\.

Perfect coherence \(ϵ=0\\epsilon=0\) is unnecessarily restrictive, only possible for a matrix of constant orthogonal blocks\. We thus only penalize variance exceeding a target parameterτ\\tau:

\[Var\(x\)−τ\]\+=max⁡\(0,Var\(x\)−τ\)\[\\text\{Var\}\(x\)\-\\tau\]\_\{\+\}=\\max\(0,\\text\{Var\}\(x\)\-\\tau\)and

\[Cov\(x\)−τ\]\+=max⁡\(0,Cov\(x\)−τ\)\.\[\\text\{Cov\}\(x\)\-\\tau\]\_\{\+\}=\\max\(0,\\text\{Cov\}\(x\)\-\\tau\)\.

#### Normalization kernel\.

We choose squaredL1L^\{1\}normalization as our normalization kernel, that is

wj\(i\)=Mij2∑kMik2andvi\(j\)=Mij2∑kMkj2\.w^\{\(i\)\}\_\{j\}=\\frac\{M\_\{ij\}^\{2\}\}\{\\sum\_\{k\}M\_\{ik\}^\{2\}\}\\quad\\text\{and\}\\quad v^\{\(j\)\}\_\{i\}=\\frac\{M\_\{ij\}^\{2\}\}\{\\sum\_\{k\}M\_\{kj\}^\{2\}\}\.This concentrates weight on dominant activations without introducing a temperature hyperparameter\.

See[Algorithm1](https://arxiv.org/html/2606.02841#alg1)for pseudocode\. Variance simplifies viaVar\(X\)=𝔼\[X2\]−𝔼\[X\]2Var\(X\)=\\mathbb\{E\}\[X^\{2\}\]\-\\mathbb\{E\}\[X\]^\{2\}:

VarR\(ri\)=∑jWij‖cj‖2−‖ϕ\(ri\)‖2\.\\text\{Var\}\_\{R\}\(r\_\{i\}\)=\\sum\_\{j\}W\_\{ij\}\\\|c\_\{j\}\\\|^\{2\}\-\\\|\\phi\(r\_\{i\}\)\\\|^\{2\}\.Covering terms expand similarly\. Complexity is𝒪\(B2L\+BL2\)\\mathcal\{O\}\(B^\{2\}L\+BL^\{2\}\)per batch, dominated by barycenter computation\.

Table 4:Hyperparameter sweep on rotatedMNIST\.Sp\.% = mean active features per sample \(active if\>1%\>1\\%of max\)\. T% = fraction of features with MRL\>0\.5\>0\.5\. P% = fraction of features with component score\>0\.5\>0\.5\. Bold rows indicate selected hyperparameters\. The lower MSE for the two digit experiment in general is due to doubling the sample set\.Table 5:Double Digit Experiment\.Results averaged over 10 random seeds\. Both digits lack180180degree symmetry but high MRL180scores reflect the tendency of networks to collapse antipodal angles\.Cohachieves better angular tuning \(180\), with almost22times higher purity and at modest reconstruction cost\. We do however have high variance over interpretability metrics over seeds, however,Cohachieves consistent coherence \(Loc, Cov\) across all seeds\. Variance in purity reflects distinct but equally coherent solutions \(see[Figure7](https://arxiv.org/html/2606.02841#A1.F7),[Figure11](https://arxiv.org/html/2606.02841#A1.F11),[Figure12](https://arxiv.org/html/2606.02841#A1.F12)\)\.Coh\+ L1 combines both objective functions, guidesCohtowards a sparse solution and reliably achieves class separation and angle tuning while maintaining coherence, at the cost of higher MSE\.
#### Hyperparameter selection\.

For the single\-digit experiment,Cohperforms well across a range ofλ\\lambdavalues while maintaining stable MSE\. Although L1 achieves slightly lower reconstruction error,Cohproduces better\-tuned features even with less sparsity\. We selectλCoh=λL1=10−3\\lambda\_\{\\text\{\{Coh\}\}\}=\\lambda\_\{L^\{1\}\}=10^\{\-3\}\.

For the two\-digit experiment, we observe angular tuning modulo180180degrees rather than full360360degrees tuning\. AsλCoh\\lambda\_\{\\text\{\{Coh\}\}\}increases, MRL180improves, but purity drops sharply atλ=5×10−3\\lambda=5\\times 10^\{\-3\}where the two latent circles merge, collapsing class information while preserving angular structure\. We selectλCoh=λL1=10−3\\lambda\_\{\\text\{\{Coh\}\}\}=\\lambda\_\{L^\{1\}\}=10^\{\-3\}\.

#### Decoding angles\.

Using circular coordinates\(de Silvaet al\.,[2011](https://arxiv.org/html/2606.02841#bib.bib2)\)\(implementation from\(Pereaet al\.,[2023](https://arxiv.org/html/2606.02841#bib.bib11)\)\), we extract angles from the most persistentH1H\_\{1\}class in both the latent sample and feature spaces\.[Figures10](https://arxiv.org/html/2606.02841#A1.F10)and[15](https://arxiv.org/html/2606.02841#A1.F15)show these recovered angles plotted against the true generating angles\. To transfer angles from features to samples, we assign each sample the angle of its maximally\-activating feature\. The strong agreement confirms that the circle discovered in feature space corresponds to the circle in sample space, demonstrating how coherent features enable direct readout of latent structure\.

Table 6:Hyperparameters used in all experiments\.We choose non\-negative activation function, Softplus, with a highβ\\betavalue to encourage sparsity while mitigating the ’dead neuron’ issue typically associated with ReLU\. We use AdamW optimizer withlr=10−3lr=10^\{\-3\}, weight decay10−510^\{\-5\}, cosine annealing schedule\(etamin=lr/10\)\(\\text\{eta\}\_\{\\text\{min\}\}=lr/10\), and gradient clipping at norm1\.01\.0\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x5.png)

Figure 5:Toy experiment: sphere\.We replicate the two circle toy experiment with a sphere\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x6.png)

Figure 6:Toy experiment: torus\.We replicate the two circle toy experiment with a torus\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x7.png)

Figure 7:Double Digit Experiment\.For two different seeds we show UMAP projections of latent samples and latent features, with persistence diagrams for each\. We highlight the two most persistentH1H\_\{1\}features, representing the two expected circles\. Samples are colored by activation of a single feature and features are colored by activation on a single sample\. The seeds are picked as to show the diversity in the learned latent spaces\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x8.png)

Figure 8:Double Digit Experiment \(Coh\+ L1\)\.For two different seeds we show UMAP projections of latent samples and latent features, with persistence diagrams for each\. We highlight the two most persistentH1H\_\{1\}features, representing the two expected circles\. Samples are colored by activation of a single feature and features are colored by activation on a single sample\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x9.png)

Figure 9:Single digit experiment\.We plot UMAP projections of latent samples and latent features colored by a latent feature and sample representatively\. For a random sample of features we plot the weighted sum of the original images, the random features corresponds to the coloring of the latent samples\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x10.png)

Figure 10:Single digit experiment\.We analyse at a representative latent space from the single digit experiment using circular coordinates from persistence \(co\)homology, and compare it against the true generating angles\. Second row we color the latent samples by circular coordinates\. In the last row we plot circular coordinates against the true angle\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x11.png)

Figure 11:Double digit experiment \(Separated\)\.We plot UMAP projections of latent samples and latent features colored by a latent feature and sample representatively\. For a random sample of features we plot the weighted sum of the original images, the random features corresponds to the coloring of the latent samples\. We noteCohnicely distinguishes the two classes into two circles\. Features corresponding to the label33, have angular tuning modulo180180degrees, while features corresponding to the digit77have a360360degree angular tuning\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x12.png)

Figure 12:Double digit experiment \(Merged\)\.We plot UMAP projections of latent samples and latent features colored by a latent feature and sample representatively\. For a random sample of features we plot the weighted sum of the original images, the random features corresponds to the coloring of the latent samples\. We note, in this case,Cohhas merged the two classes into the same circle, but have excellent angular tuning\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x13.png)

Figure 13:Double digit experiment \(L1L^\{1\}andCoh\) \.We plot UMAP projections of latent samples and latent features colored by a latent feature and sample representatively\. For a random sample of features we plot the weighted sum of the original images, the random features corresponds to the coloring of the latent samples\. UsingCohand L1 together we get a much cleaner results\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x14.png)

Figure 14:Double digit experiment \(L1L^\{1\}andCoh\)\.We find a separated a latent space in the double digit experiment using circular coordinates from persistence \(co\)homology, and compare it against the true generating angles\. Note that we have both good angle tuning and component tuning of the features\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x15.png)

Figure 15:Double digit experiment \(Separated\)\.We find a separated latent space in the double digit experiment using circular coordinates from persistence \(co\)homology, and compare it against the true generating angles\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x16.png)

Figure 16:Double digit experiment \(Merged\)\.We analyse here a merged latent space from the double digit experiment using circular coordinates from persistence \(co\)homology, and compare it against the true generating angles\.

## Appendix BBERT: Additional Plots and Tables\.

![Refer to caption](https://arxiv.org/html/2606.02841v1/x17.png)

Figure 17:Mean±\\pmstd over55seeds for the overlap of the top2020tokens per feature against the2020\-nearest\-neighbor token neighborhood in the Vanilla embedding\.#### Scoring interpretability using LLMs\.

For the first seed of each of the two non\-negative token embeddings, we examine the top1010tokens for each of the256256features\. We feed this list of features into Claude Opus 4\.7 and ask it to assign a binary score to each feature according to whether it is interpretable\. It finds8181“interpretable” features in theCohembedding and zero in theSoftplusembedding\. We list all of these features, represented by their top1010tokens, together with an explanation, inLABEL:tab:interpdict1\. We note that while theSoftplusembedding has zero pure features, it has some that come close, such as \[‘scotland’, ‘moving’, ‘australia’, ‘wales’, ‘cardiff’, ‘ireland’, ‘virginia’, ‘located’, ‘india’, ‘pennsylvania’\], which seems to lean toward locations; but if we were to count this loosely, theCohmodel would have nearly all of its features classified as “interpretable”\. We also note that the number of features Claude judges interpretable is similar to the count given by the overlap measure, but the actual numbers by the LLM judge should, of course, be taken with a grain of salt\.

![Refer to caption](https://arxiv.org/html/2606.02841v1/x18.png)

Figure 18:For the first seed, we plot the average best feature overlap with a token neighborhood of the Vanilla model at each epoch\. We also track the average coherence score per epoch\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x19.png)

Figure 19:UMAP projections of the token and feature embeddings for the first seed\. The plots are colored by the values of a single token vector and a single feature vector\.![Refer to caption](https://arxiv.org/html/2606.02841v1/x20.png)

Figure 20:Again for the first seed: for each feature we plot, on thexx\-axis, its distance to its2020nearest feature vectors, and on theyy\-axis, the overlap of its top2020tokens with those of the neighboring feature\. In words, each point answers: how far am I from that feature \(xx\-axis\), and how similar am I to it in terms of top\-2020tokens \(yy\-axis\)? This showcases that the coherent embedding has more meaningful geometry for the features\.
#### Implementation details\.

We note thatCohcan “cheat” by separating the token embedding into two clusters, such as verbs and the rest of the vocabulary\. The loss can then push these clusters apart, increasing the row and column scales and thereby decreasing the coherence score\. There are several ways to address this; the one we use here is to sample*locally*\. We begin by sampling the rows of the matrix globally \(at random\), and as the embedding becomes more coherent, we sample sub\-matrices locally, based on distance anchors chosen at random\.

A second issue is that the coherence loss can push tokens to fit the topology of the features, whereas we would rather it push features to fit the topology of the tokens\. We encourage this by downscaling the terms governing row variance and covering by a factor of1010relative to their column counterparts\.

We run one “global” iteration of the coherence objective, as in[1](https://arxiv.org/html/2606.02841#alg1), per batch, sampling10241024tokens and using all features\. Simultaneously, we run55“local” iterations: from the10241024tokens we pick55anchors at random, and from each anchor we form a sub\-matrix of dimension128×32128\\times 32based on the128128closest tokens to the anchor and the3232most active features given those tokens\. We gradually introduce the local sub\-matrices using a scheduler based on the coherence of the previous epoch\.

Table 7:Average results over55seeds after300300epochs\.Table 8:A list of all “pure” features according to Claude for the first seed for the coherent embedding\.Feat\.Top 10 tokensConceptNumbers450, ten, 24, 100, 60, 20, 19, 15, 40, 12Cardinal numbers2911, 13, 17, 23, 18, 14, 24, 22, 38, 26Two\-digit numbers4230, 45, 39, 29, 33, 28, 32, 50, 70, 19Two\-digit numbers13626, 13, 18, 28, 31, 27, 17, 23, 39, 22Two\-digit numbers14314, 13, 12, 22, 11, 24, 28, 19, 16, 17Two\-digit numbers14475, 45, 38, 8, 70, 80, 40, 31, 36, 35Two\-digit numbers146eight, ten, nine, seven, 300, 24, 18, 14, 23, 27Numbers17728, 27, 80, 33, seven, 35, 31, 39, 60, 90Numbers186400, 200, 16, 150, 100, 26, 23, 300, 24, 19Numbers22075, 80, 90, 60, 100, 38, 300, 200, 70, 50Numbers23929, 28, 75, 24, 14, 11, 27, 22, 45, 21Two\-digit numbersYears102007, 1972, 2005, 1999, 1984, 1970, 1964, 2016, 2008, 1960Years261999, 2014, 1997, 2002, 1994, 1991, 1964, 2015, 1996, 2004Years361993, 2002, 1997, 1999, 1986, 2003, 1985, 1984, 1994, 1992Years1122009, 2014, 2008, 2010, 2007, 2013, 2011, 2012, 2015, 2004Years1651998, 1991, 1995, 1999, 1993, 1994, 1972, 1997, 1986, 1990Years1902016, 2015, 2014, 2011, 1940, 2013, 2004, 2009, 1999, 2010Years2411993, 1988, 1998, 1986, 1970, 1984, 1997, 2002, 1989, 2001Years2522005, 2001, 2002, 2004, 2006, 1995, 2000, 1992, 2003, 1988YearsVerbs by semantic role0entered, killed, attacked, struck, arrived, moved, captured, formed, returned, ranMotion / action past verbs65turned, sold, cut, passed, dropped, defeated, reached, entered, divided, struckPast action verbs72give, see, write, know, bring, follow, keep, want, begin, makeBasic action verbs81travel, meet, hold, find, produce, reach, stop, build, fight, followAction verbs93find, write, produce, create, believe, build, know, hold, reach, leaveCreate / produce verbs189leave, stop, get, meet, follow, keep, find, allow, continue, tryContinuation verbs245avoid, follow, keep, prevent, give, provide, begin, continue, perform, buildContinuation / prevention54established, rejected, introduced, accepted, developed, founded, sold, played, abandoned, controlledInstitutional action verbs57performed, played, produced, recorded, joined, met, served, done, created, achievedCareer / creative verbs61released, raised, removed, defeated, adopted, performed, achieved, published, gained, leftAchievement past verbs89provided, held, faced, created, shot, ran, introduced, sent, fired, paidPast\-tense action verbs97extended, brought, left, adopted, passed, represented, earned, introduced, broken, turnedPast action verbs127formed, developed, sustained, struck, affected, taken, controlled, damaged, occurred, createdDamage / formation verbs128opened, signed, started, attended, earned, founded, constructed, married, met, issuedLife\-event verbs155took, opened, grew, ran, brought, adopted, carried, taken, turned, startedPast action verbs175maintained, accepted, held, remained, created, developed, discovered, passed, visited, formedMaintain / hold verbs179published, issued, launched, conducted, provided, established, featured, included, founded, visitedPublish / launch verbs185seen, allowed, kept, used, observed, offered, lived, inspired, ordered, determinedPast participle verbs187reduced, killed, affected, damaged, raised, sustained, increased, destroyed, controlled, carriedDamage / change verbs193appointed, promoted, recognized, awarded, listed, elected, assigned, ordered, proposed, namedAppointment / award verbs210continued, intended, used, described, peaked, found, refused, lived, managed, allowedPast verbs211faced, affected, represented, damaged, covered, controlled, captured, destroyed, defeated, occurredAffect / damage verbs212adopted, signed, replaced, marked, attacked, joined, achieved, rejected, earned, paidAction past verbs223dropped, planned, made, gained, produced, shot, maintained, raised, arrived, capturedPast action verbs229produced, developed, issued, caused, struck, shot, completed, recorded, marked, formedProduce / cause verbs235lost, achieved, filmed, dropped, held, produced, discovered, entered, met, facedPast action verbsSpeech / cognition verbs66expressed, claimed, concluded, felt, revealed, attempted, decided, agreed, explained, meantCognition / statement121commented, wanted, noted, appeared, agreed, read, praised, wrote, criticized, gotReaction verbs166claimed, wanted, argued, suggested, saw, appear, felt, stated, attempted, readCognition / statement176revealed, showed, read, concluded, meant, uses, suggested, got, claimed, feltReveal / conclude verbs232noted, suggested, thought, meant, argued, revealed, showed, believed, expressed, concludedThought / argument verbsVerb participles \(\-ing\)111losing, remaining, growing, returning, winning, resulting, doing, selling, coming, finding\-ing participlesAdverbs by function87typically, thus, always, usually, finally, therefore, ultimately, probably, initially, yetSentence adverbs94probably, initially, still, sometimes, finally, therefore, once, eventually, now, thusTemporal adverbs124once, even, immediately, sometimes, finally, possibly, previously, almost, initially, soonTemporal adverbs150now, actually, already, nearly, still, almost, always, typically, simply, sometimesTemporal / degree170currently, particularly, especially, usually, eventually, always, thus, often, simply, perhapsSentence adverbs217significantly, typically, generally, mainly, slightly, primarily, officially, fully, relatively, initiallyDegree / manner238possibly, mainly, particularly, mostly, relatively, slightly, probably, especially, usually, perhapsHedging adverbs244relatively, largely, generally, heavily, previously, too, really, already, increasingly, widelyDegree adverbsAdjectives14major, important, famous, prominent, professional, possible, successful, special, powerful, shortImportance adjectives45separate, direct, subsequent, surrounding, supporting, following, previous, early, earlier, increasingSequence adjectives52last, fourth, sixth, fifth, third, earliest, final, next, first, seventhOrdinal / sequence words76little, good, serious, better, strong, unknown, difficult, significant, certain, closeEvaluative adjectives138soviet, european, roman, royal, british, portuguese, jewish, indian, domestic, italianNationality / empire adj\.Nouns22states, places, sets, matches, times, hours, weeks, plays, months, termsPlural count nouns41staff, police, workers, administration, soldiers, officers, crew, troops, writers, authoritiesGroups of personnel58studies, members, elements, images, sources, characters, levels, stars, pieces, systemsPlural abstract nouns99systems, artists, areas, countries, parts, images, lines, regions, elements, starsPlural category nouns135drama, fiction, opera, comedy, plot, script, novel, movie, book, storyNarrative genres151school, museum, hotel, academy, theatre, club, institute, station, assembly, batteryInstitutional buildings156actor, writer, director, producer, author, coach, singer, critic, manager, commanderProfessions162richard, robert, michael, david, thomas, peter, james, edward, paul, johnMale first names200husband, brother, sister, marriage, daughter, birth, son, mother, wife, familyFamily / kinship208effects, damage, winds, casualties, rainfall, plans, reports, conditions, orders, evidenceEffects / damage nouns234committee, commission, company, foundation, council, post, party, regiment, battalion, programOrganizational bodiesPlaces / geography25scotland, ireland, japan, china, manchester, minnesota, australia, paris, canada, chicagoPlaces \(countries / cities\)85israel, wales, california, washington, pennsylvania, chicago, croatia, canada, york, texasPlaces \(countries / states\)254london, australia, britain, germany, japan, france, philadelphia, carolina, africa, englandCountries / placesFunction words & units68%, inches, metres, feet, mi,∘, km, percent, cm, hundredUnits of measurement69when, around, throughout, within, upon, before, during, towards, alongside, viaTemporal / spatial preps169upon, before, towards, within, toward, alongside, through, onto, around, viaDirectional prepositions

## Appendix CPreliminaries

We briefly recap the essential definitions and two practical lemmas that we later use to prove the interleaving\.

###### Definition C\.1\.

A*simplicial complex*KKis a finite set of non\-empty finite sets, that is closed under taking non\-empty subsets\. The*vertex set*ofKK, denotedV\(K\)V\(K\), is the set of singletons ofKK\. We sayf:K→Lf:K\\rightarrow Lis a map between simplicial complexes if it is a map on the vertex setf:V\(K\)→V\(L\)f:V\(K\)\\rightarrow V\(L\)such that it extends to simplices in the following way: ifσ∈K\\sigma\\in K, thenf\(σ\)∈Lf\(\\sigma\)\\in L\.

###### Definition C\.2\.

LetIIbe a totally ordered set\. A*filtered simplicial complex*,ℱ≔\{Fs\}s∈I\{\\mathcal\{F\}\}\\coloneqq\\\{F\_\{s\}\\\}\_\{s\\in I\}, is a sequence of simplicial complexes such that ifs≤t∈Is\\leq t\\in I, thenFs⊆FtF\_\{s\}\\subseteq F\_\{t\}and⋃s∈IV\(Fs\)\\bigcup\_\{s\\in I\}V\(F\_\{s\}\)is finite\. We callV\(ℱ\)≔⋃s∈IV\(Fs\)V\(\\mathcal\{F\}\)\\coloneqq\\bigcup\_\{s\\in I\}V\(F\_\{s\}\)for the*vertex set*ofℱ\{\\mathcal\{F\}\}\.

###### Definition C\.3\.

Let\(X,dX\)\(X,d\_\{X\}\)be a metric space andP⊂XP\\subset Xbe a finite set of samples, then the*Vietoris\-Rips filtration*ofPPis the filtered simplicial complex overℝ\\mathbb\{R\}, defined by

VRt\(P\)≔\{σ⊂P∣σ≠∅,dX\(x,y\)≤2t∀x,y∈σ\}VR\_\{t\}\(P\)\\coloneqq\\\{\\sigma\\subset P\\mid\\sigma\\neq\\emptyset,d\_\{X\}\(x,y\)\\leq 2t\\quad\\forall x,y\\in\\sigma\\\}at filtration valuet∈ℝt\\in\\mathbb\{R\}\.

###### Definition C\.4\.

Letf,g:X→Yf,g:X\\rightarrow Ybe continuous maps\. A*homotopy*betweenffandggis a continuous mapH:X×I→YH:X\\times I\\rightarrow Y, whereIIis the unit interval, such thatH\(x,0\)=f\(x\)H\(x,0\)=f\(x\)andH\(x,1\)=g\(x\)H\(x,1\)=g\(x\)\.

###### Definition C\.5\.

Let\{p0,p1,…,pn\}⊂ℝn\\\{p\_\{0\},p\_\{1\},\\dots,p\_\{n\}\\\}\\subset\\mathbb\{R\}^\{n\}be a finite set of samples and lett\>0t\>0be a scalar\. The*geometric realization*ofVRϵ\(P\)VR\_\{\\epsilon\}\(P\), denoted\|VRϵ\(P\)\|\|VR\_\{\\epsilon\}\(P\)\|, is a topological space constructed as follows:

\|VRt\(P\)\|=⋃σ∈VRt\(P\)\{∑pi∈σλipi∣λi≥0,∑pi∈σλi=1\}⊂ℝn\.\|VR\_\{t\}\(P\)\|=\\bigcup\_\{\\sigma\\in VR\_\{t\}\(P\)\}\\left\\\{\\sum\_\{p\_\{i\}\\in\\sigma\}\\lambda\_\{i\}p\_\{i\}\\mid\\lambda\_\{i\}\\geq 0,\\sum\_\{p\_\{i\}\\in\\sigma\}\\lambda\_\{i\}=1\\right\\\}\\subset\\mathbb\{R\}^\{n\}\.

###### Definition C\.6\.

Let𝒦=\{Kt\}t∈ℝ\\mathcal\{K\}=\\\{K\_\{t\}\\\}\_\{t\\in\\mathbb\{R\}\}andℒ=\{Lt\}t∈ℝ\\mathcal\{L\}=\\\{L\_\{t\}\\\}\_\{t\\in\\mathbb\{R\}\}be filtered simplicial complexes\. Letδ≥0\\delta\\geq 0be a scalar\. We say𝒦\\mathcal\{K\}andℒ\\mathcal\{L\}areδ\\delta\-interleaved if there exist families of maps\{ft:Kt→Lt\+δ\}t∈ℝ\\\{f\_\{t\}:K\_\{t\}\\rightarrow L\_\{t\+\\delta\}\\\}\_\{t\\in\\mathbb\{R\}\}and\{gt:Lt→Kt\+δ\}t∈ℝ\\\{g\_\{t\}:L\_\{t\}\\rightarrow K\_\{t\+\\delta\}\\\}\_\{t\\in\\mathbb\{R\}\}, such that for anyt≤s∈ℝt\\leq s\\in\\mathbb\{R\}, the following diagrams commute up to homotopy, when passing to the realization\.

Kt\{\{K\_\{t\}\}\}Kt\+δ\{\{K\_\{t\+\\delta\}\}\}Kt\+2δ\{\{K\_\{t\+2\\delta\}\}\}Kt\+δ\{\{K\_\{t\+\\delta\}\}\}Lt\+δ\{\{L\_\{t\+\\delta\}\}\}Lt\{\{L\_\{t\}\}\}Lt\+δ\{\{L\_\{t\+\\delta\}\}\}Lt\+2δ\{\{L\_\{t\+2\\delta\}\}\}ft\\scriptstyle\{f\_\{t\}\}ft\+δ\\scriptstyle\{f\_\{t\+\\delta\}\}gt\+δ\\scriptstyle\{g\_\{t\+\\delta\}\}gt\\scriptstyle\{g\_\{t\}\}Kt\{\{K\_\{t\}\}\}Ks\{\{K\_\{s\}\}\}Kt\+δ\{\{K\_\{t\+\\delta\}\}\}Ls\+δ\{\{L\_\{s\+\\delta\}\}\}Lt\+δ\{\{L\_\{t\+\\delta\}\}\}Ls\+δ\{\{L\_\{s\+\\delta\}\}\}Lt\{\{L\_\{t\}\}\}Ls\{\{L\_\{s\}\}\}ft\\scriptstyle\{f\_\{t\}\}fs\\scriptstyle\{f\_\{s\}\}gt\\scriptstyle\{g\_\{t\}\}gs\\scriptstyle\{g\_\{s\}\}

###### Definition C\.7\.

Two simplicial mapsf,g:S→Kf,g\\colon S\\rightarrow Kare calledcontiguousif for every simplexσ⊆S\\sigma\\subseteq Swe have thatf\(σ\)∪g\(σ\)f\(\\sigma\)\\cup g\(\\sigma\)is a simplex inKK\.

###### Lemma C\.8\.

\(Spanier,[1994](https://arxiv.org/html/2606.02841#bib.bib16)\)\[Lemma 2, p\.130\] Two contiguous simplicial maps become homotopic after geometric realization\.

###### Lemma C\.9\.

LetK⊂\(X,dX\)K\\subset\(X,d\_\{X\}\)andL⊂\(Y,dY\)L\\subset\(Y,d\_\{Y\}\)be finite subsets of metric spacesXXandYY\. If there is a scalarδ≥0\\delta\\geq 0and there are mapsf:K→Lf:K\\rightarrow Landg:L→Kg:L\\rightarrow Ksatisfying the following:

1. 1\.IfdX\(ki,kj\)≤2td\_\{X\}\(k\_\{i\},k\_\{j\}\)\\leq 2t, thendY\(f\(ki\),f\(kj\)\)≤2t\+2δd\_\{Y\}\(f\(k\_\{i\}\),f\(k\_\{j\}\)\)\\leq 2t\+2\\delta,
2. 2\.ifdY\(li,lj\)≤2td\_\{Y\}\(l\_\{i\},l\_\{j\}\)\\leq 2t, thendX\(g\(li\),g\(lj\)\)≤2t\+2δd\_\{X\}\(g\(l\_\{i\}\),g\(l\_\{j\}\)\)\\leq 2t\+2\\delta,
3. 3\.ifdX\(ki,kj\)≤2td\_\{X\}\(k\_\{i\},k\_\{j\}\)\\leq 2t, thendX\(ki,g∘f\(kj\)\)≤2t\+4δd\_\{X\}\(k\_\{i\},g\\circ f\(k\_\{j\}\)\)\\leq 2t\+4\\delta
4. 4\.ifdY\(li,lj\)≤2td\_\{Y\}\(l\_\{i\},l\_\{j\}\)\\leq 2t, thendY\(li,f∘g\(lj\)\)≤2t\+4δd\_\{Y\}\(l\_\{i\},f\\circ g\(l\_\{j\}\)\)\\leq 2t\+4\\delta,

thenVR\(K,dX\)VR\(K,d\_\{X\}\)andVR\(L,dY\)VR\(L,d\_\{Y\}\)areδ\\delta\-interleaved

###### Proof\.

This follows from[LemmaC\.8](https://arxiv.org/html/2606.02841#A3.Thmtheorem8)\. ∎

## Appendix DProofs

###### Proof of[Proposition3\.9](https://arxiv.org/html/2606.02841#S3.Thmtheorem9)\.

We only show the first of the statements\. By locality we have that

Varℛ\(ri\)=∑jwj\(i\)‖ϕ\(ri\)−cj‖22≤ϵ\.Var\_\{\\mathcal\{R\}\}\(r\_\{i\}\)=\\sum\_\{j\}w^\{\(i\)\}\_\{j\}\\\|\\phi\(r\_\{i\}\)\-c\_\{j\}\\\|\_\{2\}^\{2\}\\leq\\epsilon\.Note that

‖ϕ\(ri\)−Φ\(ri\)‖22=∑jwj\(i\)‖ϕ\(ri\)−Φ\(ri\)‖22≤∑jwj\(i\)‖ϕ\(ri\)−cj‖22≤ϵ\.\\\|\\phi\(r\_\{i\}\)\-\\Phi\(r\_\{i\}\)\\\|\_\{2\}^\{2\}=\\sum\_\{j\}w\_\{j\}^\{\(i\)\}\\\|\\phi\(r\_\{i\}\)\-\\Phi\(r\_\{i\}\)\\\|\_\{2\}^\{2\}\\leq\\sum\_\{j\}w^\{\(i\)\}\_\{j\}\\\|\\phi\(r\_\{i\}\)\-c\_\{j\}\\\|^\{2\}\_\{2\}\\leq\\epsilon\.Here we have used that∑jwj\(i\)=1\\sum\_\{j\}w\_\{j\}^\{\(i\)\}=1and that‖ϕ\(ri\)−Φ\(ri\)‖≤‖ϕ\(ri\)−cj‖\\\|\\phi\(r\_\{i\}\)\-\\Phi\(r\_\{i\}\)\\\|\\leq\\\|\\phi\(r\_\{i\}\)\-c\_\{j\}\\\|for alljjby definition ofΦ\\Phi\. We conclude that‖ϕ\(ri\)−Φ\(ri\)‖2≤ϵ1/2\\\|\\phi\(r\_\{i\}\)\-\\Phi\(r\_\{i\}\)\\\|\_\{2\}\\leq\\epsilon^\{1/2\}by taking the square\-root on both sides\. ∎

###### Proof of[Proposition3\.10](https://arxiv.org/html/2606.02841#S3.Thmtheorem10)\.

Letrir\_\{i\}be a row\. By definition ofψ\\psi, being linearly extended by the convex hull we have that

ψ∘ϕ\(ri\)=ψ\(∑jwj\(i\)cj\)=∑jwj\(i\)ψ\(cj\)\.\\psi\\circ\\phi\(r\_\{i\}\)=\\psi\(\\sum\_\{j\}w^\{\(i\)\}\_\{j\}c\_\{j\}\)=\\sum\_\{j\}w^\{\(i\)\}\_\{j\}\\psi\(c\_\{j\}\)\.Hence,

‖ri−ψ∘ϕ\(ri\)‖22=‖ri−∑jwj\(i\)ψ\(cj\)‖22≤∑jwj\(i\)‖ri−ψ\(cj\)‖22≤ϵ\\\|r\_\{i\}\-\\psi\\circ\\phi\(r\_\{i\}\)\\\|\_\{2\}^\{2\}=\\\|r\_\{i\}\-\\sum\_\{j\}w^\{\(i\)\}\_\{j\}\\psi\(c\_\{j\}\)\\\|\_\{2\}^\{2\}\\leq\\sum\_\{j\}w^\{\(i\)\}\_\{j\}\\\|r\_\{i\}\-\\psi\(c\_\{j\}\)\\\|\_\{2\}^\{2\}\\leq\\epsilonby using the Jensen’s inequality and thatMMisϵ\\epsilon\-covered\. By taking the square root on both sides we arrive at the statement\. ∎

###### Proof of[Theorem3\.12](https://arxiv.org/html/2606.02841#S3.Thmtheorem12)\.

To prove an interleaving we use[C\.9](https://arxiv.org/html/2606.02841#A3.Thmtheorem9), we only prove statement11and33there as the other two are dual\. For11we assume that‖ri−rj‖2≤2t\\\|r\_\{i\}\-r\_\{j\}\\\|\_\{2\}\\leq 2t, then from the11\-Lipschitz assumption on the barycentric maps, we have that‖ϕ\(ri\)−ϕ\(rj\)‖2≤2t\.\\\|\\phi\(r\_\{i\}\)\-\\phi\(r\_\{j\}\)\\\|\_\{2\}\\leq 2t\.By[3\.9](https://arxiv.org/html/2606.02841#S3.Thmtheorem9)and the triangle inequality we have that

‖Φ\(ri\)−Φ\(rj\)‖2\\displaystyle\\\|\\Phi\(r\_\{i\}\)\-\\Phi\(r\_\{j\}\)\\\|\_\{2\}≤‖Φ\(ri\)−ϕ\(ri\)‖2\+‖ϕ\(ri\)−ϕ\(rj\)‖2\+‖Φ\(rj\)−ϕ\(rj\)‖2\\displaystyle\\leq\\\|\\Phi\(r\_\{i\}\)\-\\phi\(r\_\{i\}\)\\\|\_\{2\}\+\\\|\\phi\(r\_\{i\}\)\-\\phi\(r\_\{j\}\)\\\|\_\{2\}\+\\\|\\Phi\(r\_\{j\}\)\-\\phi\(r\_\{j\}\)\\\|\_\{2\}≤2t\+2ϵ1/2\.\\displaystyle\\leq 2t\+2\\epsilon^\{1/2\}\.For part 3 in[C\.9](https://arxiv.org/html/2606.02841#A3.Thmtheorem9)we assume that‖ri−rj‖2≤2t\\\|r\_\{i\}\-r\_\{j\}\\\|\_\{2\}\\leq 2t, then by the triangle inequality we have that

‖ri−Ψ∘Φ\(rj\)‖2≤‖ri−rj‖2\+‖rj−Ψ∘Φ\(rj\)‖2\.\\\|r\_\{i\}\-\\Psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq\\\|r\_\{i\}\-r\_\{j\}\\\|\_\{2\}\+\\\|r\_\{j\}\-\\Psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\.
By the triangle inequality applied to the last summand and[Proposition3\.9](https://arxiv.org/html/2606.02841#S3.Thmtheorem9)we have that

‖rj−Ψ∘Φ\(rj\)‖2≤‖rj−ψ∘Φ\(rj\)‖2\+‖ψ∘Φ\(rj\)−Ψ∘Φ\(rj\)‖2≤‖rj−ψ∘Φ\(rj\)‖2\+ϵ1/2\.\\\|r\_\{j\}\-\\Psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq\\\|r\_\{j\}\-\\psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\+\\\|\\psi\\circ\\Phi\(r\_\{j\}\)\-\\Psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq\\\|r\_\{j\}\-\\psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\+\\epsilon^\{1/2\}\.By the triangle inequality on the first summand and[3\.10](https://arxiv.org/html/2606.02841#S3.Thmtheorem10)we have that

‖rj−ψ∘Φ\(rj\)‖2≤‖rj−ψ∘ϕ\(rj\)‖2\+‖ψ∘ϕ\(rj\)−ψ∘Φ\(rj\)‖2≤ϵ1/2\+‖ψ∘ϕ\(rj\)−ψ∘Φ\(rj\)‖2\.\\\|r\_\{j\}\-\\psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq\\\|r\_\{j\}\-\\psi\\circ\\phi\(r\_\{j\}\)\\\|\_\{2\}\+\\\|\\psi\\circ\\phi\(r\_\{j\}\)\-\\psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq\\epsilon^\{1/2\}\+\\\|\\psi\\circ\\phi\(r\_\{j\}\)\-\\psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\.By applying the11\-Lipschitz assumption ofψ\\psiand[3\.9](https://arxiv.org/html/2606.02841#S3.Thmtheorem9)on the second summand we have that

‖ψ∘ϕ\(rj\)−ψ∘Φ\(rj\)‖2≤‖ϕ\(rj\)−Φ\(rj\)‖2≤ϵ1/2\.\\\|\\psi\\circ\\phi\(r\_\{j\}\)\-\\psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq\\\|\\phi\(r\_\{j\}\)\-\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq\\epsilon^\{1/2\}\.By putting everything together we have that

‖ri−Ψ∘Φ\(rj\)‖2≤2t\+3ϵ1/2≤2t\+4ϵ1/2\\\|r\_\{i\}\-\\Psi\\circ\\Phi\(r\_\{j\}\)\\\|\_\{2\}\\leq 2t\+3\\epsilon^\{1/2\}\\leq 2t\+4\\epsilon^\{1/2\}proving the statement\. ∎

## Appendix EDerived Lipschitz Constants

We give a quick result that allows to express the Lipschitz constant of the barycentric maps in terms of those of the normalization kernel\.

###### Proposition E\.1\(Derived Lipschitz constants\)\.

LetM∈ℝ\+m×nM\\in\\mathbb\{R\}\_\{\+\}^\{m\\times n\}with rowsℛ=\{r1,…,rm\}⊂ℝn\\mathcal\{R\}=\\\{r\_\{1\},\\ldots,r\_\{m\}\\\}\\subset\\mathbb\{R\}^\{n\}and columns𝒞=\{c1,…,cn\}⊂ℝm\\mathcal\{C\}=\\\{c\_\{1\},\\ldots,c\_\{n\}\\\}\\subset\\mathbb\{R\}^\{m\}\. Suppose the normalization kernelsσR\\sigma\_\{R\}andσC\\sigma\_\{C\}, viewed as maps\(ℝ\+n∖\{0\},∥⋅∥2\)→\(Δn−1,∥⋅∥1\)\(\\mathbb\{R\}^\{n\}\_\{\+\}\\setminus\\\{0\\\},\\\|\\cdot\\\|\_\{2\}\)\\to\(\\Delta^\{n\-1\},\\\|\\cdot\\\|\_\{1\}\)and\(ℝ\+m∖\{0\},∥⋅∥2\)→\(Δm−1,∥⋅∥1\)\(\\mathbb\{R\}^\{m\}\_\{\+\}\\setminus\\\{0\\\},\\\|\\cdot\\\|\_\{2\}\)\\to\(\\Delta^\{m\-1\},\\\|\\cdot\\\|\_\{1\}\), have Lipschitz constantsLσRL\_\{\\sigma\_\{R\}\}andLσCL\_\{\\sigma\_\{C\}\}respectively\. Then the barycentric maps satisfy

‖ϕ\(ri\)−ϕ\(rj\)‖2≤Kϕ⋅‖ri−rj‖2,‖ψ\(cj\)−ψ\(ck\)‖2≤Kψ⋅‖cj−ck‖2,\\\|\\phi\(r\_\{i\}\)\-\\phi\(r\_\{j\}\)\\\|\_\{2\}\\leq K\_\{\\phi\}\\cdot\\\|r\_\{i\}\-r\_\{j\}\\\|\_\{2\},\\qquad\\\|\\psi\(c\_\{j\}\)\-\\psi\(c\_\{k\}\)\\\|\_\{2\}\\leq K\_\{\\psi\}\\cdot\\\|c\_\{j\}\-c\_\{k\}\\\|\_\{2\},where

Kϕ=LσR⋅maxj⁡‖cj‖2,Kψ=LσC⋅maxi⁡‖ri‖2\.K\_\{\\phi\}=L\_\{\\sigma\_\{R\}\}\\cdot\\max\_\{j\}\\\|c\_\{j\}\\\|\_\{2\},\\qquad K\_\{\\psi\}=L\_\{\\sigma\_\{C\}\}\\cdot\\max\_\{i\}\\\|r\_\{i\}\\\|\_\{2\}\.

###### Proof\.

We prove the bound forϕ\\phi; the bound forψ\\psiis identical with rows and columns exchanged\. By the closed formϕ\(ri\)=∑jwj\(i\)cj\\phi\(r\_\{i\}\)=\\sum\_\{j\}w^\{\(i\)\}\_\{j\}c\_\{j\}wherew\(i\)=σR\(ri\)w^\{\(i\)\}=\\sigma\_\{R\}\(r\_\{i\}\):

ϕ\(ri\)−ϕ\(rj\)=∑k\(wk\(i\)−wk\(j\)\)ck\.\\phi\(r\_\{i\}\)\-\\phi\(r\_\{j\}\)=\\sum\_\{k\}\(w^\{\(i\)\}\_\{k\}\-w^\{\(j\)\}\_\{k\}\)\\,c\_\{k\}\.By the triangle inequality:

‖ϕ\(ri\)−ϕ\(rj\)‖2=‖∑k\(wk\(i\)−wk\(j\)\)ck‖2≤∑k\|wk\(i\)−wk\(j\)\|⋅‖ck‖2≤‖w\(i\)−w\(j\)‖1⋅maxk⁡‖ck‖2\.\\\|\\phi\(r\_\{i\}\)\-\\phi\(r\_\{j\}\)\\\|\_\{2\}=\\left\\\|\\sum\_\{k\}\(w^\{\(i\)\}\_\{k\}\-w^\{\(j\)\}\_\{k\}\)\\,c\_\{k\}\\right\\\|\_\{2\}\\leq\\sum\_\{k\}\|w^\{\(i\)\}\_\{k\}\-w^\{\(j\)\}\_\{k\}\|\\cdot\\\|c\_\{k\}\\\|\_\{2\}\\leq\\\|w^\{\(i\)\}\-w^\{\(j\)\}\\\|\_\{1\}\\cdot\\max\_\{k\}\\\|c\_\{k\}\\\|\_\{2\}\.By the Lipschitz property ofσR\\sigma\_\{R\}:

‖w\(i\)−w\(j\)‖1=‖σR\(ri\)−σR\(rj\)‖1≤LσR‖ri−rj‖2\.∎\\\|w^\{\(i\)\}\-w^\{\(j\)\}\\\|\_\{1\}=\\\|\\sigma\_\{R\}\(r\_\{i\}\)\-\\sigma\_\{R\}\(r\_\{j\}\)\\\|\_\{1\}\\leq L\_\{\\sigma\_\{R\}\}\\,\\\|r\_\{i\}\-r\_\{j\}\\\|\_\{2\}\.\\qed

## Appendix FRelationship to Topological Data Analysis

We would like to discuss the relationship of this paper with ideas and methods developed in the field of topological data analysis\. An obvious connection is through the way we formalized our main result in terms of Vietoris–Rips filtrations and interleavings\. These concepts are already reviewed in Appendix[C](https://arxiv.org/html/2606.02841#A3)\. A second deep connection is with the substantial work on relational constructs and in particular Dowker duality and Dowker filtrations\. We discuss this connection here\.

Given a relationA⊆I×JA\\subseteq I\\times Jon the setsIIandJJ, theDowker row and column complexesare defined as

DR=\{σ⊆I∣∃j∈J:σ×\{j\}⊆A\},\\displaystyle D\_\{R\}=\\\{\\sigma\\subseteq I\\mid\\exists\\,j\\in J\\colon\\sigma\\times\\\{j\\\}\\subseteq A\\\},DC=\{τ⊆J∣∃i∈I:\{i\}×τ⊆A\}\.\\displaystyle D\_\{C\}=\\\{\\tau\\subseteq J\\mid\\exists\\,i\\in I\\colon\\\{i\\\}\\times\\tau\\subseteq A\\\}\.The terminology comes from interpreting the relation as a binary matrix where the rows are the elements ofIIand the columns the elements ofJJ\. This is also how we can start to see a bridge to the matrixMMstudied in this paper\. The above complexes were first introduced by Dowker in a seminal paper\(Dowker,[1952](https://arxiv.org/html/2606.02841#bib.bib3)\)\. The main result of that paper establishes ahomology equivalencebetween the row and column complexes\. This result and its stronger formulation as a homotopy equivalence are today known asDowker duality\.

There is a remarkable variety of proofs for this duality result\. Without claiming to be exhaustive: Björner\(Björner,[1995](https://arxiv.org/html/2606.02841#bib.bib31)\)gives a proof that relies on a good covering of the row complex in terms of columns and then invokes the nerve lemma; Brun and Salbu\(Brun and Salbu,[2023](https://arxiv.org/html/2606.02841#bib.bib32)\)construct arectangle complexthat admits projections to the row and column complexes and then apply Quillen’s Theorem A to exhibit these as homotopy equivalences; in another work, Brun and Grinberg\(Brun and Grinberg,[2024](https://arxiv.org/html/2606.02841#bib.bib33)\)use discrete Morse theory to prove the same result; finally, Yoon\(Yoon,[2024](https://arxiv.org/html/2606.02841#bib.bib34)\)gives three different proofs using a Galois connection derived from the relation, a relational join, and a relational product\.

What ties all these proofs together is that they rely on some form of fiber lemma and, in order to apply it, some contractibility property\. To give an example, in Björner’s proof it is crucial that the subcomplexes

Ui=\{τ⊆J∣\{i\}×τ⊆A\}⊆DC\\displaystyle U\_\{i\}=\\\{\\tau\\subseteq J\\mid\\\{i\\\}\\times\\tau\\subseteq A\\\}\\subseteq D\_\{C\}fori∈Ii\\in Iform a good cover of the column complexDCD\_\{C\}\.

Another extensive area of research in applied and computational topology concerns possible extensions of Dowker complexes and in particular filtrations of Dowker complexes\. One example starts from a non\-binary matrixMMand then studies the row and column complexes constructed after binarizingMMat different threshold values\. This leads to filtrations of simplicial complexes and it is possible to lift Dowker duality to an appropriate functorial result in this setting\(Chowdhury and Mémoli,[2018](https://arxiv.org/html/2606.02841#bib.bib35); Virk,[2021](https://arxiv.org/html/2606.02841#bib.bib36)\)\. Another extension of the classical setting of Dowker duality is introduced by Robinson\(Robinson,[2022](https://arxiv.org/html/2606.02841#bib.bib37)\)\. There, row and column complexes are combined into cosheaves where one complex is the base space and the other one provides the fibers\. The same paper also discusses the notion ofDowker total weight filtrations, which annotate the simplices in a Dowker complex with the cardinality of their witnesses and derive filtrations from these\.

In the classical Dowker duality setup the involved complexes are agnostic to the exact cardinality of witnesses\. This is actually crucial for all proofs of Dowker duality to go through, as it guarantees the aforementioned contractibility conditions that are needed in order to apply the various versions of fiber lemmas\. Once we condition the existence of simplices in the complexes on the number of witnesses—like, for example, in the Dowker total weight filtrations—duality breaks down in general\.

We believe that here lies an important link to our current work\. The relationship of the Vietoris–Rips filtrations of the rows and columns of a matrixMMcan be understood as ageometric relaxationof the classical Dowker duality setting\. Similar to the total weight filtrations discussed above, a duality between the row and column pictures of the data may not exist in general in this relaxation\. It is then interesting to study conditions on the matrix that recover it, and our coherence regularizer is an attempt at precisely this\.

In general, we think that it can be very fruitful to examine all possible proofs of Dowker duality and their detailed intricacies in order to learn about the different failure modes that can arise in geometric relaxations\. These failure modes can in turn inspire conditions that prevent such failure and thus guarantee some form of duality—like, for example, an interleaving\.

In particular, the different perspectives offered by Yoon\(Yoon,[2024](https://arxiv.org/html/2606.02841#bib.bib34)\)seem very interesting\. The relational join is perhaps close in spirit to a blown\-up metric space that contains both the rows of a matrix and their barycenters of columns \(or vice versa\)\. These could sometimes be interleaved, for example as suggested by the Dowker interleaving result of Chazal et al\.\(Chazalet al\.,[2014](https://arxiv.org/html/2606.02841#bib.bib38)\)\. Another interesting connection might be Application 4\.2 in\(Yoon,[2024](https://arxiv.org/html/2606.02841#bib.bib34)\)\. In general, spectral sequence computations with \(not necessarily good\) covers seem very fruitful in studying settings where Dowker duality holds only under appropriate conditions\.

A deep connection, in our opinion, is through the general result thathocolimX𝒰≃X\\mathrm\{hocolim\}\\,X\_\{\\mathcal\{U\}\}\\simeq Xfor every cover of a topological spaceXX, not just good covers\. HereX𝒰X\_\{\\mathcal\{U\}\}is an appropriate simplicial diagram of topological spaces; this is due to Segal\(Segal,[1968](https://arxiv.org/html/2606.02841#bib.bib39)\)and later generalized by Dugger and Isaksen\(Dugger and Isaksen,[2004](https://arxiv.org/html/2606.02841#bib.bib40)\)\. This result can, for example, replace fiber lemmas in proofs of Dowker duality in order to obtain extensions of Dowker duality to diagrams of appropriate complexes, as pursued by Vaupel and Dunn\(Vaupel and Dunn,[2023](https://arxiv.org/html/2606.02841#bib.bib41)\)\.
Learning Coherent Representations: A Topological Approach to Interpretability

Similar Articles

Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces

Large language models reorganize representational geometry during in-context learning

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

A Unified Geometric Framework for Weighted Contrastive Learning

Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization

Submit Feedback

Similar Articles

Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces
Large language models reorganize representational geometry during in-context learning
Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography
A Unified Geometric Framework for Weighted Contrastive Learning
Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization