@vintcessun: Turns out LLM text embeddings are hijacked by high-frequency tokens (periods, articles)! The unembedding matrix implicitly defines a low-rank subspace dominated by these uninformative expressions. This is the root cause of LLMs' poor performance as universal embeddings, and the contamination is subtle. EmbedFilter…

X AI KOLs Timeline Papers

Summary

This study reveals that LLM text embeddings are hijacked by high-frequency tokens (e.g., periods, articles) and proposes EmbedFilter, which performs SVD on the unembedding matrix and subtracts the projection component to release true semantics, achieving zero-training-cost dimensionality reduction and retrieval efficiency gains.

Turns out LLM text embeddings are hijacked by high-frequency tokens (periods, articles)! The unembedding matrix implicitly defines a low-rank subspace dominated by these uninformative expressions. This is the root cause of LLMs' poor performance as universal embeddings, and the contamination is subtle. EmbedFilter: perform SVD on the unembedding matrix, take the top k singular vectors to form a subspace, and subtract the projected component from the embedding — a single linear transformation releases true semantics, enabling natural dimensionality reduction with zero training overhead and doubling retrieval indexing efficiency.
Original Article
View Cached Full Text

Cached at: 06/12/26, 08:59 AM

It turns out that LLM text embeddings are hijacked by high-frequency tokens (periods, articles)! The unembedding matrix implicitly defines a low-rank subspace that dominates these uninformative expressions. This is the root cause of LLMs’ poor performance as general-purpose embeddings, and the contamination is subtle. EmbedFilter: perform SVD on the unembedding matrix, take the top k singular vectors to form a subspace, and subtract the projected component from the embedding — a linear transformation that releases true semantics, naturally reduces dimensionality with zero training overhead, and doubles retrieval index efficiency.


Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Source: https://arxiv.org/html/2606.07502 (2018)

Abstract.

Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. However, they struggle to function as off-the-shelf embedding models, leading to suboptimal performance on massive text embedding benchmarks. In this paper, we identify a potential cause underlying this deficiency. Our motivation stems from an unexpected observation: text embeddings tend to align with frequent but uninformative tokens when projected onto the vocabulary space. We argue that this excessive expression of high-frequency tokens suppresses the model’s ability to capture nuanced semantics. To address this, we introduce EmbedFilter, a simple linear transformation designed to refine text embeddings derived from LLMs directly. Specifically, we uncover that the unembedding matrix within LLMs encodes a latent space that is actively writing these frequent tokens into embedding space. By filtering out this subspace, EmbedFilter suppress the influence of high-frequency tokens, thereby enhancing semantic representations. As a compelling byproduct, this enables an inherent dimensionality reduction, lowering index storage and speedup retrieval while fully preserving the refined embedding quality. Our experiments across multiple LLM backbones demonstrate that LLMs equipped with EmbedFilter achieve superior zero-shot downstream performance even with significantly reduced embedding dimensions. We hope our findings provide deeper insights into the mechanisms of LLM-based representations and inspire more principled designs to improve text embeddings training. Our code is available athttps://github.com/CentreChen/EmbFilter.

Zero-shot Text Embedding, Large Language Model, Mechanistic Interpretation

††copyright:acmlicensed††journalyear:2018††doi:XXXXXXX.XXXXXXX††conference:Make sure to enter the correct conference title from your rights confirmation email; June 03–05, 2018; Woodstock, NY††isbn:978-1-4503-XXXX-X/2018/06††ccs:Information systems Language models††ccs:Information systems Novelty in information retrieval## 1.Introduction

Large language models (LLMs) have made significant strides in recent years, demonstrating impressive performance across a wide range of tasks(DeepSeek-AI,2026 (https://arxiv.org/html/2606.07502#bib.bib1); Grattafioriet al.,2024 (https://arxiv.org/html/2606.07502#bib.bib18); Team,2024 (https://arxiv.org/html/2606.07502#bib.bib17)). The emergence of zero-shot learning ability helps LLMs address unseen tasks effectively without any additional fine-tuning(Kaplanet al.,2020 (https://arxiv.org/html/2606.07502#bib.bib3)). However, recent studies highlight a persistent performance gap of LLMs when deployed as zero-shot text embedding models(Jianget al.,2024 (https://arxiv.org/html/2606.07502#bib.bib4); Li and Zhou,2025 (https://arxiv.org/html/2606.07502#bib.bib6); BehnamGhaderet al.,2024 (https://arxiv.org/html/2606.07502#bib.bib7)). This deficiency hinders their adoption for text embedding tasks and raises concerns regarding their full efficacy as generalist models in real-world applications.

To bridge this gap, researchers have explored various attempts to better elicit semantic information from LLMs. Prompt-engineering methods have been proposed to help extract text embeddings directly from LLMs(Jianget al.,2024 (https://arxiv.org/html/2606.07502#bib.bib4); Springeret al.,2025 (https://arxiv.org/html/2606.07502#bib.bib5); Leiet al.,2024 (https://arxiv.org/html/2606.07502#bib.bib8); Thirukovalluru and Dhingra,2025 (https://arxiv.org/html/2606.07502#bib.bib9)). These approaches are well motivated; however, their improvements are modest and highly sensitive to the choice of the prompt, leading to inconsistent performance across different setups. Existing approaches are primarily heuristic and fail to resolve the bottleneck that limits LLMs’ ability to capture semantics. In this paper, we move beyond previous heuristic efforts and seek to provide a mechanistic interpretation for LLMs’ suboptimal performance in text embedding tasks. Specifically, we identify an unexpected representation collapse: when projected onto the vocabulary space, raw text embeddings from LLMs tend to align with high-frequency tokens that are semantically irrelevant. Equipped with the Logit Lens tool(Belroseet al.,2023 (https://arxiv.org/html/2606.07502#bib.bib10)), we find that frequent but uninformative tokens disproportionately dominate the highest decoding probabilities of these text embeddings. This suggests that these hidden representations are biased toward common vocabulary tokens, regardless of the input semantics111For readers unfamiliar with Logit Lens, please refer to Section2 (https://arxiv.org/html/2606.07502#S2)for further details.. As shown in Figure1 (https://arxiv.org/html/2606.07502#S1.F1), this phenomenon is observed across different language model families, indicating a universal pattern inherent to LLMs.

Refer to captionFigure 1.Logit Lens applied to text embeddings from three LLM backbones. Word clouds show the top-aligned tokens with the highest decoding probabilities, which are primarily high-frequency yet semantically uninformative. The input text, encoded by the text embeddings, is given as:”We call this a ‘lens’ because it is one way of extracting information from GPT’s internal activations. I imagine there is other information present in the activations that cannot be understood by looking at logits over tokens. The logit lens show us some of what is going on, not all of it.”This corresponds to the official notation of the logit lens.We extend our analysis to uncover the underlying drivers of this representation collapse. Prior studies(Liet al.,2020 (https://arxiv.org/html/2606.07502#bib.bib11); Ethayarajh,2019 (https://arxiv.org/html/2606.07502#bib.bib12))have established that text embeddings areanisotropic: they are confined to a narrow cone rather than being uniformly distributed in the embedding space. We hypothesize that the centroid of this narrow region corresponds to an “average” token, whichLvet al.(2024 (https://arxiv.org/html/2606.07502#bib.bib13))describe as the frequency-weighted average embedding over the training corpus. This perspective provides a mechanistic rationale for the atypical patterns observed in Logit Lens analyses. Raw embeddings from LLMs are pulled toward this commonality region, overshadowing their unique semantic features. By suppressing the contribution of these ”average” components, we can mitigate the anisotropy problem and unmask the true semantic representations within LLMs.

We seek to pinpoint the hidden contributor that steer text embeddings towards the ”average” token representation. To this end, we apply Logit Spectroscopy(Cancedda,2024 (https://arxiv.org/html/2606.07502#bib.bib14))to a reverse-engineered ”average” token, and uncover a latent subspace, which is actively writing these frequent tokens into the embedding space. We refer to this subspace as the”edge spectrum”space, as it is spanned by the right singular vectors with the smallest and largest singular values — those positioned at the ends of the spectrum. We find that when the projection of the ”average” token onto this subspace is truncated, the logits of these frequent tokens are significantly disrupted. Section3 (https://arxiv.org/html/2606.07502#S3)delves into the discovery of the edge spectrum, providing a detailed account of its identification

Leveraging this insight, we show that this subspace can be effectively filtered out via a simple linear transformation, which we term EmbedFilter. This transformation is encoded within the parameters of the unembedding matrix and is readily accessible without further training. Our evaluations across a diverse suite of downstream tasks demonstrate that EmbedFilter acts as a potent post-processing enhancement, delivering steady incremental gains atop existing zero-shot text embedding baselines. EmbedFilter exhibits strong robustness across various backbone models and experimental configurations while incurring minimal computational overhead. Beyond performance gains, EmbedFilter naturally lends itself to dimensionality reduction as a distance-preserving transformation. This reduction lowers indexing overhead and speeds up retrieval, facilitating the practical deployment of LLMs.

To sum up, the contributions of this paper are threefold.

(1) We identify the LLM unembedding matrix as a previously overlooked feature lens to analyze the embedding space. We reveal that this matrix encodes a latent subspace corresponding to an ”average” token and limits the embedding capabilities of LLMs. We provide an mechanism interpretation that clarifies both the origins and impact of this phenomenon.

(2) We introduce EmbedFilter, a simple linear transformation that improves the zero-shot text embedding performance of LLMs. As an efficient post-processing technique, EmbedFilter achieves up to a 14.1% improvement on MTEB without any training overhead. Extensive evaluations across diverse experimental setups further demonstrate its broad applicability.

(3) We demonstrate that EmbedFilter acts as a distance-preserving transformation and enable embedding dimensionality reduction. This leads to faster retrieval and lower storage requirements, thereby facilitating the practical deployment of LLMs in large-scale text embedding applications.

2.Background

To establish the background for EmbedFilter we first review the fundamentals of embedding extraction and introduce the mechanistic interpretability tools used throughout our analysis.

2.1.Text Embedding Paradigm

We first formulate the standard process of LLM-based text embedding extraction. Our objective is to transform sentenceX{\bm{X}}into a dense vectorh∈Rd{\bm{h}}\in\mathbb{R}^{d}, such that the similarity between these vectors can reflect their semantic similarity. Given an input sentenceX=[x1,x2,…,xL]{\bm{X}}=\left[x_{1},x_{2},\dots,x_{L}\right], its embeddingh{\bm{h}}is obtained by passingX{\bm{X}}through an LLM backbone, followed by a pooling strategyP\operatorname{P}:

h=P⁡(LLM⁡([x1,x2,…,xL])),{\bm{h}}\;=\;\operatorname{P}\left(\,\operatorname{LLM}\,(\left[\,x_{1},x_{2},\dots,x_{L}\,\right])\,\right),whereP\operatorname{P}aggregates the final layer outputs fromLLMinto add-dimensional representationh{\bm{h}}. Typically, the unembedding matrix is conceptually designed to map these hidden states back to the vocabulary space for token prediction. We contend that this module has been overlooked in the context of traditional text embedding extraction and can be exploited to enhance embeddings qualities.

2.2.Text Embeddings with Prompt Engineering

Many studies have explored improving the performance of LLMs on text embedding tasks through prompt engineering. Here, we provide a brief overview of two well-established baselines:

PromptEOL(Jianget al.,2024 (https://arxiv.org/html/2606.07502#bib.bib4))finds that a ”one word limitation” template can help better condense semantics into the hidden state, thereby enhancing the representation of LLM-derived embeddings.

ECHO(Springeret al.,2025 (https://arxiv.org/html/2606.07502#bib.bib5))suggests that causal attention in LLMs is a bottleneck, as earlier tokens cannot access future context. To mitigate this, they duplicate the input and extract embeddings from the second occurrence, incurring overhead from the increased input size.

More sophisticated prompt-engineering methods have been proposed(Leiet al.,2024 (https://arxiv.org/html/2606.07502#bib.bib8); Thirukovalluru and Dhingra,2025 (https://arxiv.org/html/2606.07502#bib.bib9)); however, these often necessitate intricate pipeline designs and incur substantial computational overhead. While our primary experiments focus on the aforementioned baselines, we provide a broader discussion and evaluation of these more complex strategies in our supplementary analysis.

2.3.Mechanistic Interpretability Tools

We provide an overview of two interpretability tools — Logit Lens(Belroseet al.,2023 (https://arxiv.org/html/2606.07502#bib.bib10))and Logit Spectroscopy(Cancedda,2024 (https://arxiv.org/html/2606.07502#bib.bib14))— which facilitate the identification of edge spectrum subspace and inspire the design of EmbedFilter.

Logit Lens(Belroseet al.,2023 (https://arxiv.org/html/2606.07502#bib.bib10))represents a cornerstone of mechanistic interpretability research. Its central premise is to project a model’s intermediate representations directly into the vocabulary space. By analyzing the resulting changes in these logits, researchers can discern how specific intermediate activations shape the final predictions, thereby gaining insights into the model’s internal processing logic. Building on this framework,Nieet al.(2025 (https://arxiv.org/html/2606.07502#bib.bib15))apply the Logit Lens tool to text embeddings and find that these embeddings can align with certain keywords from the input texts.

To further dissect the semantic properties of different embedding subspaces,Logit Spectroscopy(Cancedda,2024 (https://arxiv.org/html/2606.07502#bib.bib14))extends Logit Lens by projecting intermediate representations onto spectral components of model’s weight matrices. LetWU{\bm{W}}_{\mathcal{U}}be the unembedding matrix of the LLM. Its singular value decomposition can be formulated as:

WU=UΣV⊤,{\bm{W}}_{\mathcal{U}}\;=\;{\bm{U}}\,\Sigma\,{\bm{V}}^{\top},whereWU∈R|V|×d\bm{W_{\mathcal{U}}}\in\mathbb{R}^{\left|\mathcal{V}\right|\times d}, withddrepresenting the hidden-state dimension and|V||\mathcal{V}|the vocabulary size. For an arbitrary dimensioni∈{0,…,d−1}i\in\{0,\dots,d-1\}, Logit Spectroscopy introduces a filterΨi\bm{\Psi_{i}}that removes the projection ofh{\bm{h}}onto theii-th right singular vector ofV{\bm{V}}. Formally, this transformation is defined as:

Ψi=I−V[i]V[i]⊤.{\bm{\Psi_{i}}}\;=\;{{\bm{I}}-{\bm{V}}_{[i]}\,{\bm{V}}_{[i]}^{\top}}.This operation facilitates the spectral analysis of an LLM’s intermediate representations, enabling researchers to measure the contribution of hidden states within different spectral subspaces to the final output. Section3 (https://arxiv.org/html/2606.07502#S3)details how we leverage these tools to identify the ”edge spectrum” subspace.

3.Discovery of Edge Spectrum Subspace

3.1.Motivation

In this section, we present the preliminaries analyses that motivate the development of EmbedFilter. Our investigation is driven by an observed correlation between two key insights:

(1) Raw text embeddings from LLMs are typically anisotropic(Liet al.,2020 (https://arxiv.org/html/2606.07502#bib.bib11); Suet al.,2021 (https://arxiv.org/html/2606.07502#bib.bib16)). These embeddings are concentrated in a narrow subspace, making them excessively similar to one another;

(2) LLM-derived embeddings often align with high-frequency tokens that carry little semantics.

These insights lead us to reasonably infer that the narrow subspace is responsible for encoding frequent tokens. Consequently, we seek to isolate this subspace and mitigate its impact, thereby alleviating the anisotropy problem in text embedding tasks. To accomplish this, we first reverse-engineer a ”centroid” hidd

Similar Articles

@Potatoloogs: How LLMs Actually Work Inside: From Token to Next-Token – A Complete Overview of Nine Core Mechanisms a) Tokenization: The model doesn't read text, it reads integers · Text is first split into subword pieces, then mapped to integer IDs; modern LLM vocabularies typically have tens of thousands to...

X AI KOLs Timeline

This article systematically outlines the nine core mechanisms inside modern LLMs, from tokenization to next-token prediction, including tokenization, embedding, positional encoding, attention, multi-head attention, feed-forward networks, etc., and compares architectural differences between various models.

How LLMs Actually Work (26 minute read)

TLDR AI

A detailed walkthrough of how transformer-based LLMs work, covering tokenization, embeddings, attention, and next-token prediction without heavy math.

Linguistics-Aware Non-Distortionary LLM Watermarking

arXiv cs.CL

Introduces LUNA, a linguistics-aware LLM watermarking method that achieves non-distortionary embedding and model-free detection across multiple languages, significantly improving AUROC and perplexity preservation.

How LLMs Actually Work

Lobsters Hottest

An in-depth walkthrough of how modern LLMs work, covering core mechanisms from tokenization to next-token prediction, without heavy math.