A Geometric Profile of Semantic Information in Text: Frame-Conditional Uniqueness and a Trade-Off Triangle for Scalar Summaries
Summary
This paper develops a geometric framework to measure semantic content of texts using sentence embeddings, proposing a three-coordinate semantic profile (novelty, breadth, integration) and a scalar trade-off triangle, validated across synthetic categories and novels.
View Cached Full Text
Cached at: 06/11/26, 01:36 PM
# A Geometric Profile of Semantic Information in Text: Frame-Conditional Uniqueness and a Trade-Off Triangle for Scalar Summaries
Source: [https://arxiv.org/html/2606.11222](https://arxiv.org/html/2606.11222)
###### Abstract
How much meaning does a text carry? Shannon’s theory measures uncertainty over symbols and is intentionally indifferent to meaning, while pairwise metrics such as BERTScore compare two texts rather than characterizing one\. We develop a geometric framework that measures semantic content from the structure of a text’s sentence embeddings\.
The framework has three parts\. First, within a fixed embedding and baseline, six natural axioms uniquely determine a scalar measure up to scale—a frame\-conditional uniqueness theorem\. The resulting scalar is empirically too coarse, motivating a richer representation\. Second, we propose a three\-coordinate*semantic profile*capturing*novelty*\(displacement from generic discourse\),*breadth*\(diversity of distinct ideas\), and*integration*\(connectedness among them\), together with a discrete minimal unit—the*semantic quantum*—whose resolution is fixed by a clustering thresholdτ\\tau\. Third, we prove a no\-go theorem: no scalar summary of the profile can simultaneously satisfy analytic stability under paraphrase and concatenation, ordinal robustness across text scales, and cross\-representation comparability\. We exhibit two practical scalars,SminmaxS\_\{\\mathrm\{minmax\}\}andSrankS\_\{\\mathrm\{rank\}\}, each occupying a distinct corner of this trade\-off triangle\.
Validation across2323synthetic categories,55Project Gutenberg novels, and33embedding models confirms the trade\-off\. The recommended rank\-normalized configuration passes2525of2828ordinal checks as point estimates \(2121of2828after Benjamini–Hochberg correction\), outperforming seven baselines including unigram entropy and a BERTScore\-based novelty signal\. A separate variational result connects the breadth coordinate to the log\-determinant of a determinantal point process \(Spearmanρ=0\.985\\rho=0\.985over507507Gutenberg chapters\), giving an optimization\-theoretic foundation for breadth\.
## 1Introduction
Shannon’s theory of information quantifies uncertainty over symbol sequences, not semantic content\[[9](https://arxiv.org/html/2606.11222#bib.bib9)\]\. Texts with similar token statistics can differ substantially in meaning, and paraphrases can preserve meaning despite lexical variation\. Any semantic measure of text must therefore be grounded in a representation of meaning rather than in syntactic unpredictability alone\.
Contemporary embedding models provide such a representation\[[6](https://arxiv.org/html/2606.11222#bib.bib6),[7](https://arxiv.org/html/2606.11222#bib.bib7)\]\. Sentences and paragraphs can be mapped into high\-dimensional vector spaces in which geometric relations encode semantic similarity\. This suggests a geometric approach: the semantic content of a text should be reflected by the structure of its embedding cloud\. The cloud’s displacement from a neutral baseline, its spread, and its internal connectedness are all directly measurable, and together form a richer descriptor than any scalar\.
#### Position relative to existing work\.
The framework is complementary to, not a replacement for, Shannon information theory\[[9](https://arxiv.org/html/2606.11222#bib.bib9)\]: Shannon describes unpredictability of form, while the present profile describes geometric properties of meaning\. Shannon’s universality follows from the existence of a canonical primitive \(the symbol distribution\) and a clean composition rule \(joint entropy\); as we show in §[6](https://arxiv.org/html/2606.11222#S6), semantics has neither\. The framework is also distinct from pairwise text\-similarity metrics such as BERTScore, BLEURT, and ROUGE, which measure similarity between two texts rather than the internal richness of one; we use a BERTScore\-based novelty signal as a baseline in §[8\.5](https://arxiv.org/html/2606.11222#S8.SS5)\. Philosophical accounts of semantic information\[[1](https://arxiv.org/html/2606.11222#bib.bib1),[3](https://arxiv.org/html/2606.11222#bib.bib3),[4](https://arxiv.org/html/2606.11222#bib.bib4),[2](https://arxiv.org/html/2606.11222#bib.bib2)\]provide rigorous logical frameworks but are not computable on real text\.
#### Contributions\.
The paper makes four contributions:
1. 1\.Frame\-conditional uniqueness theorem\(§[3](https://arxiv.org/html/2606.11222#S3)\)\. Within a fixed embedding and baseline, six natural axioms uniquely determine the Semantic Information LawIE\(T\)=‖μT−μ0‖⋅rank\(CT\)I\_\{E\}\(T\)=\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|\\cdot\\mathrm\{rank\}\(C\_\{T\}\)up to scale\. This is a representation\-conditional result, not a Shannon\-style universal law; empirically it is too coarse, motivating the profile\.
2. 2\.Three\-coordinate profile and semantic quantum\(§§[4](https://arxiv.org/html/2606.11222#S4)–[5](https://arxiv.org/html/2606.11222#S5)\)\. The profile\(N,B,I\)\(N,B,I\)captures novelty, breadth, and integration\. The quantum is a discrete minimal unit whose resolution is set by a clustering thresholdτ\\tau, making measurement resolution explicit\.
3. 3\.No\-go theorem: trade\-off triangle\(§[6](https://arxiv.org/html/2606.11222#S6)\)\. No scalar summary of the profile can simultaneously satisfy analytic stability under paraphrase and concatenation, ordinal robustness, and cross\-representation comparability\. Two practical scalarsSminmaxS\_\{\\mathrm\{minmax\}\}andSrankS\_\{\\mathrm\{rank\}\}\(§[7](https://arxiv.org/html/2606.11222#S7)\) each occupy a distinct corner\.
4. 4\.Empirical validation and variational characterization\(§§[8](https://arxiv.org/html/2606.11222#S8)–[9](https://arxiv.org/html/2606.11222#S9)\)\. Across2323synthetic categories,55Gutenberg novels, and33embedding models,SrankS\_\{\\mathrm\{rank\}\}with weights\(0\.5,3\.0,1\.0\)\(0\.5,3\.0,1\.0\)passes2525of2828ordinal checks as point estimates and beats seven baselines\. Separately, the breadth coordinate empirically equals the log\-determinant of a determinantal point process \(ρ=0\.985\\rho=0\.985on507507chapters\), supplying an optimization\-theoretic foundation forBB\.
The framework’s central position is that semantic information in text is a*representation\-indexed structured profile*, not a universal scalar\. The profile is the theoretical object; scalar summaries are practical conveniences whose forms reflect the impossibility result\.
## 2Notation and Representation
LetTTbe a text partitioned into segmentsT=\(T1,…,Tk\)T=\(T\_\{1\},\\dots,T\_\{k\}\), and letE:Text→ℝnE:\\text\{Text\}\\to\\mathbb\{R\}^\{n\}be a sentence\-embedding model withei=E\(Ti\)e\_\{i\}=E\(T\_\{i\}\)\. The raw embedding cloud isXT=\{e1,…,ek\}X\_\{T\}=\\\{e\_\{1\},\\dots,e\_\{k\}\\\}with meanμT\\mu\_\{T\}and covarianceCTC\_\{T\}\. Letμ0\\mu\_\{0\}andΣ0\\Sigma\_\{0\}denote the mean and covariance of a neutral baseline corpus, embedded viaEE\.
The basic difficulty in measuring internal semantic structure is that repetition and near\-duplication distort geometry\. To remove this artifact, the framework first clusters highly similar segments using agglomerative clustering with cosine distance and thresholdτ∈\(0,1\)\\tau\\in\(0,1\), then replaces each cluster by its \(renormalized\) centroid\. LetT~=\{c1,…,cm\}\\tilde\{T\}=\\\{c\_\{1\},\\dots,c\_\{m\}\\\}denote these deduplicated centroids\. We develop the formal interpretation of these centroids as*semantic quanta*and the role ofτ\\tauas a measurement\-resolution parameter in §[5](https://arxiv.org/html/2606.11222#S5)\.
### 2\.1Scope of the framework
The framework is explicitly representation\-indexed: every quantity defined below depends on a fixed choice of measurement apparatus\. The dependencies are:
- •the embedding modelEE;
- •the segmentation rule \(sentence, clause, fixed\-size chunk\);
- •the baseline corpus throughμ0\\mu\_\{0\}andΣ0\\Sigma\_\{0\};
- •the deduplication thresholdτ\\tau;
- •the normalization reference set used for any scalar summary\.
This dependence is a feature, not a defect\. It makes the framework’s commitments inspectable, calibratable per domain, and consistent with the impossibility result of §[6](https://arxiv.org/html/2606.11222#S6)that no representation\-free semantic scalar exists\.
## 3The Semantic Information Law: Frame\-Conditional Uniqueness
### 3\.1Axioms
We postulate six axioms constraining any scalar measureI:Text→ℝI:\\text\{Text\}\\to\\mathbb\{R\}within the fixed frame\(E,μ0\)\(E,\\mu\_\{0\}\):
1. 1\.*Paraphrase invariance*:I\(T\)=I\(T′\)I\(T\)=I\(T^\{\\prime\}\)wheneverE\(T\)≈E\(T′\)E\(T\)\\approx E\(T^\{\\prime\}\)\.
2. 2\.*Redundancy non\-increase*: exact replication of a segment does not increaseII\.
3. 3\.*Novelty monotonicity*: for fixed covariance,IIis monotone in‖μT−μ0‖\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|\.
4. 4\.*Idea additivity*: for texts in orthogonal embedding subspaces,I\(T1⊕T2\)=I\(T1\)\+I\(T2\)I\(T\_\{1\}\\oplus T\_\{2\}\)=I\(T\_\{1\}\)\+I\(T\_\{2\}\)\.
5. 5\.*Orthogonal invariance*:IIis invariant under rotations of embedding space\.
6. 6\.*Continuity*:IIis continuous in the embeddings\.
### 3\.2Derivation
###### Theorem 1\(SIL within a representational frame\)\.
Under axioms 1–6, any such measure has the form, up to a positive scale constant,
IE\(T\)=‖μT−μ0‖⋅rank\(CT\)\.I\_\{E\}\(T\)=\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|\\cdot\\mathrm\{rank\}\(C\_\{T\}\)\.
*Proof sketch\.*Axioms 3 and 5 reduce dependence onμT\\mu\_\{T\}to the scalarS\(T\)=‖μT−μ0‖S\(T\)=\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|\. Axiom 5 reduces dependence onCTC\_\{T\}to its spectrum\. Axiom 2 eliminates eigenvalue magnitudes, leavingrank\(CT\)\\mathrm\{rank\}\(C\_\{T\}\)\. Axiom 4 imposes a Cauchy functional equation on the joint dependence whose continuous solutions \(axiom 6\) are linear; a second Cauchy equation in the shift variable yields the product form\. Full derivation in the appendix\.
### 3\.3Status
Theorem[1](https://arxiv.org/html/2606.11222#Thmtheorem1)is a uniqueness theorem within the representational frame\(E,μ0\)\(E,\\mu\_\{0\}\), not a Shannon\-style universal law\. Empirically, rawrank\(CT\)\\mathrm\{rank\}\(C\_\{T\}\)saturates in neural embedding spaces \(even short texts occupy many nonzero eigendirections\), and‖μT−μ0‖\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|dominates the product\. The scalar is therefore unique under the axioms but empirically inadequate, motivating the three\-coordinate profile\.
## 4The Semantic Profile
The profilePE\(T\)=\(N,B,I\)P\_\{E\}\(T\)=\(N,B,I\)replaces the SIL scalar with three geometrically distinct coordinates: noveltyNN\(displacement from generic discourse\), breadthBB\(diversity of distinct ideas\), and integrationII\(connectedness among those ideas\)\. Before giving formal definitions, Table[1](https://arxiv.org/html/2606.11222#S4.T1)previews the three coordinates on a handful of genres, illustrating how they separate text types that a single scalar would conflate\.
Table 1:Profile coordinates on five genres \(from §[C\.4](https://arxiv.org/html/2606.11222#A3.SS4),all\-mpnet\-base\-v2\)\. Poetry is narrow and coherent; dialogue is broad and topic\-hopping; legal text is broad and coherent; code comments are broadest in integration’s opposite direction\. No single number separates these profiles the way the three coordinates do\.### 4\.1Novelty
SM\(T\)=\(μT−μ0\)⊤Σ0−1\(μT−μ0\),N\(T\)=log\(1\+SM\(T\)\)\.S\_\{M\}\(T\)=\\sqrt\{\(\\mu\_\{T\}\-\\mu\_\{0\}\)^\{\\top\}\\Sigma\_\{0\}^\{\-1\}\(\\mu\_\{T\}\-\\mu\_\{0\}\)\},\\qquad N\(T\)=\\log\(1\+S\_\{M\}\(T\)\)\.Σ0−1\\Sigma\_\{0\}^\{\-1\}is estimated via Ledoit–Wolf shrinkage\[[5](https://arxiv.org/html/2606.11222#bib.bib5)\]to ensure conditioning\. The Mahalanobis form weights displacement more heavily in low\-variance directions of the baseline; logarithmic compression bounds extreme shifts\.
### 4\.2Breadth
Deff\(T~\)=exp\(H\(p\)\),pi=λi/∑jλj,R\(T~\)=1m∑j\[1−cos\(cj,μT~\)\],D\_\{\\mathrm\{eff\}\}\(\\tilde\{T\}\)=\\exp\\bigl\(H\(p\)\\bigr\),\\quad p\_\{i\}=\\lambda\_\{i\}/\{\\textstyle\\sum\_\{j\}\}\\lambda\_\{j\},\\qquad R\(\\tilde\{T\}\)=\\frac\{1\}\{m\}\\sum\_\{j\}\\bigl\[1\-\\cos\(c\_\{j\},\\mu\_\{\\tilde\{T\}\}\)\\bigr\],B\(T\)=Deff\(T~\)⋅R\(T~\)\.B\(T\)=D\_\{\\mathrm\{eff\}\}\(\\tilde\{T\}\)\\cdot R\(\\tilde\{T\}\)\.DeffD\_\{\\mathrm\{eff\}\}is the effective rank\[[8](https://arxiv.org/html/2606.11222#bib.bib8)\];RRis the mean radial cosine distance from the centroid mean\. Defined onT~\\tilde\{T\}rather thanXTX\_\{T\}so that exact and near\-duplicates do not inflate it\.
### 4\.3Integration
I1\-NN\(T\)=1m∑jmaxl≠jcos\(cj,cl\),I2\-NN\(T\)=1m∑jsecondmaxl≠jcos\(cj,cl\)\.I\_\{1\\text\{\-NN\}\}\(T\)=\\frac\{1\}\{m\}\\sum\_\{j\}\\max\_\{l\\neq j\}\\cos\(c\_\{j\},c\_\{l\}\),\\qquad I\_\{2\\text\{\-NN\}\}\(T\)=\\frac\{1\}\{m\}\\sum\_\{j\}\\mathrm\{secondmax\}\_\{l\\neq j\}\\cos\(c\_\{j\},c\_\{l\}\)\.Validation in §[8](https://arxiv.org/html/2606.11222#S8)shows that 2\-NN combined with the recommended weights below is the only configuration passing the coherent\-vs\-bag\-of\-facts check\.
## 5The Semantic Quantum
The deduplication step of §[2](https://arxiv.org/html/2606.11222#S2)suggests a discrete unit of semantic structure\. We formalize this unit here because it underlies several derived measures and makes the role ofτ\\tauas a measurement\-resolution parameter explicit\.
### 5\.1Definition and regimes
###### Definition 1\(Semantic quantum\)\.
Given a textTTwith segment embeddingsXT=\{e1,…,ek\}X\_\{T\}=\\\{e\_\{1\},\\dots,e\_\{k\}\\\}and thresholdτ∈\(0,1\)\\tau\\in\(0,1\), the*quantum set*Qτ\(T\)=T~=\{c1,…,cm\}Q\_\{\\tau\}\(T\)=\\tilde\{T\}=\\\{c\_\{1\},\\dots,c\_\{m\}\\\}is the collection of centroids produced by agglomerative clustering ofXTX\_\{T\}at cosine\-distance threshold1−τ1\-\\tau\. Eachcj∈Qτ\(T\)c\_\{j\}\\in Q\_\{\\tau\}\(T\)is a*semantic quantum*ofTTat resolutionτ\\tau\. The*quantum count*ismτ\(T\)=\|Qτ\(T\)\|m\_\{\\tau\}\(T\)=\|Q\_\{\\tau\}\(T\)\|\.
The framework distinguishes three regimes by quantum count:
- •Sub\-quantum\(mτ≤1m\_\{\\tau\}\\leq 1\): a single centroid\. Novelty is defined \(the centroid’s displacement fromμ0\\mu\_\{0\}\), but breadth is zero and integration is conventionally maximal\. The text occupies a single point in semantic space, not a structure\.
- •Single\-quantum\(mτ=2m\_\{\\tau\}=2\): two centroids\. The minimal configuration for which all three coordinates are non\-trivially defined: novelty \(mean displacement\), breadth \(the angular separation of the pair\), and integration \(their mutual cosine similarity\)\.
- •Multi\-quantum\(mτ≥3m\_\{\\tau\}\\geq 3\): the generic regime where breadth and integration capture independently varying structural properties\.
The thresholdτ\\tauplays the role of a measurement\-resolution parameter: below scale1−τ1\-\\tauthe apparatus cannot resolve separate semantic units\. This parallels the role of resolution limits in any physical measurement, and makes the framework’s representational commitments explicit rather than hidden\.
## 6The No\-Go Theorem: Trade\-Off Triangle
Fix a parametric family of scalars of the form
S\(T;Φ\)=Φ\(φN\(N\(T\);ℛ\),φB\(B\(T\);ℛ\),φI\(I\(T\);ℛ\)\),S\(T;\\Phi\)=\\Phi\\\!\\big\(\\varphi\_\{N\}\(N\(T\);\\mathcal\{R\}\),\\,\\varphi\_\{B\}\(B\(T\);\\mathcal\{R\}\),\\,\\varphi\_\{I\}\(I\(T\);\\mathcal\{R\}\)\\big\),whereℛ\\mathcal\{R\}is a reference set of profiles and eachφX\(⋅;ℛ\)\\varphi\_\{X\}\(\\cdot;\\mathcal\{R\}\)is a per\-coordinate normalization againstℛ\\mathcal\{R\}\. Two choices matter:
- •*Min\-max*normalizationφmm\(x;ℛ\)=\(x−minℛX\)/\(maxℛX−minℛX\)\\varphi^\{\\mathrm\{mm\}\}\(x;\\mathcal\{R\}\)=\(x\-\\min\\mathcal\{R\}\_\{X\}\)/\(\\max\\mathcal\{R\}\_\{X\}\-\\min\\mathcal\{R\}\_\{X\}\)is Lipschitz in the raw value but compresses interior values wheneverℛ\\mathcal\{R\}contains outliers\.
- •*Rank*normalizationφr\(x;ℛ\)=rankℛ\(x\)/\|ℛ\|\\varphi^\{\\mathrm\{r\}\}\(x;\\mathcal\{R\}\)=\\mathrm\{rank\}\_\{\\mathcal\{R\}\}\(x\)/\|\\mathcal\{R\}\|is bounded but piecewise constant inxx\.
GivenεA,δA,δO,δR\>0\\varepsilon\_\{A\},\\delta\_\{A\},\\delta\_\{O\},\\delta\_\{R\}\>0and a paraphrase relation∼\\simover texts, define three properties:
- \(A\)*Analytic stability\.*\(A\.1\)∀T,T′:T∼T′⇒\|S\(T\)−S\(T′\)\|≤εA⋅S\(T\)\\forall T,T^\{\\prime\}:T\\sim T^\{\\prime\}\\Rightarrow\|S\(T\)\-S\(T^\{\\prime\}\)\|\\leq\\varepsilon\_\{A\}\\cdot S\(T\)\. \(A\.2\) There existsf:ℝ2→ℝf:\\mathbb\{R\}^\{2\}\\to\\mathbb\{R\}continuous such that\|S\(T1⊕T2\)−f\(S\(T1\),S\(T2\)\)\|≤δA\|S\(T\_\{1\}\\oplus T\_\{2\}\)\-f\(S\(T\_\{1\}\),S\(T\_\{2\}\)\)\|\\leq\\delta\_\{A\}for allT1,T2T\_\{1\},T\_\{2\}, where⊕\\oplusdenotes concatenation\.
- \(O\)*Ordinal robustness\.*For the benchmarkℬ\\mathcal\{B\}of Section[8](https://arxiv.org/html/2606.11222#S8), the bootstrap mean pass rate exceeds0\.5\+δO0\.5\+\\delta\_\{O\}and at least⌈\(1−δO\)⋅\|ℬ\|⌉\\lceil\(1\-\\delta\_\{O\}\)\\cdot\|\\mathcal\{B\}\|\\rceilchecks survive Benjamini–Hochberg correction atα=0\.05\\alpha=0\.05\.
- \(R\)*Cross\-representation comparability\.*There existsg:\[0,1\]→\[0,1\]g:\[0,1\]\\to\[0,1\]withg\(c\)\>δRg\(c\)\>\\delta\_\{R\}forc\>δRc\>\\delta\_\{R\}, such that for any two embeddingsE1,E2E\_\{1\},E\_\{2\}withCKA\(E1,E2\)≥c\\mathrm\{CKA\}\(E\_\{1\},E\_\{2\}\)\\geq c, the scalars satisfyρSpearman\(SE1,SE2\)≥g\(c\)\\rho\_\{\\mathrm\{Spearman\}\}\(S\_\{E\_\{1\}\},S\_\{E\_\{2\}\}\)\\geq g\(c\),*and*SE1S\_\{E\_\{1\}\}andSE2S\_\{E\_\{2\}\}are defined on the same reference setℛ\\mathcal\{R\}without embedding\-specific re\-fit\.
###### Theorem 2\(Trade\-off triangle\)\.
For anyεA<1/2\\varepsilon\_\{A\}<1/2,δA,δO,δR∈\(0,1/2\)\\delta\_\{A\},\\delta\_\{O\},\\delta\_\{R\}\\in\(0,1/2\), no scalar in the family\{S\(⋅;Φ\)\}\\\{S\(\\,\\cdot\\,;\\Phi\)\\\}satisfies more than one of\(A\),\(O\),\(R\)on all non\-degenerate benchmarks\.
###### Proof\.
We argue the three pairwise exclusions\.
*\(A\)∧\\wedge\(O\) fails\.*\(A\.1\) requires eachφX\(⋅;ℛ\)\\varphi\_\{X\}\(\\cdot;\\mathcal\{R\}\)to be Lipschitz in its argument \(so that bounded raw\-value drift implies bounded normalized drift\)\. Among normalizations parameterized by a reference set, the only Lipschitz choice is min\-max \(or an affine transformation thereof\); rank normalization violates Lipschitz continuity at every boundary between adjacent ranks\. Min\-max normalization, however, compresses interior values wheneverℛ\\mathcal\{R\}contains outliers, which it does whenever the benchmark includes both stress tests and natural text\. Empirically this compression preventsδO\\delta\_\{O\}\-level discrimination on coherent\-vs\-bag and multi\-vs\-single comparisons \(Section[8](https://arxiv.org/html/2606.11222#S8):SminmaxS\_\{\\mathrm\{minmax\}\}achieves only21/2821/28, versusSrankS\_\{\\mathrm\{rank\}\}at25/2825/28point\-estimate and21/2821/28BH\-corrected\)\. Therefore no scalar in the family with LipschitzφX\\varphi\_\{X\}achievesδO\\delta\_\{O\}bootstrap\-significant discrimination on a benchmark containing outliers, yielding \(A\)⇒¬\\Rightarrow\\neg\(O\)\.
*\(O\)∧\\wedge\(A\) fails \(same direction, different pivot\)\.*Ordinal robustness against bootstrap perturbation requires the scalar to depend on the*rank*of each coordinate withinℛ\\mathcal\{R\}, since rank is invariant to monotonic distortions of the coordinate\. Rank is piecewise constant, hence not Lipschitz, so \(A\.1\) fails for any non\-trivial paraphrase pair that crosses a rank boundary\. Further, \(A\.2\) requires closed\-form composition;rankℛ\(T1⊕T2\)\\mathrm\{rank\}\_\{\\mathcal\{R\}\}\(T\_\{1\}\\oplus T\_\{2\}\)depends on the relative position of the composition withinℛ\\mathcal\{R\}and is not determined by the ranks ofT1,T2T\_\{1\},T\_\{2\}individually\. Hence \(O\)⇒¬\\Rightarrow\\neg\(A\)\.
*\(R\)∧\\wedge\(A\) fails; \(R\)∧\\wedge\(O\) fails\.*\(R\) requiresSE1S\_\{E\_\{1\}\}andSE2S\_\{E\_\{2\}\}to share a reference setℛ\\mathcal\{R\}\. The coordinates\(N,B,I\)\(N,B,I\)depend onEE, so for the same textTTthe raw values underE1E\_\{1\}andE2E\_\{2\}generally differ\. Any normalization against a sharedℛ\\mathcal\{R\}therefore produces different ranks or different min\-max positions under the two embeddings\. A CKA boundg\(c\)g\(c\)on inter\-embedding agreement controls the Spearmanρ\\rhoon ranks, but it does not imply Lipschitz continuity of individual values, so \(A\.1\) fails through \(R\)\. It also does not prevent reference\-set\-induced re\-ordering of non\-extreme items, so \(O\) fails through \(R\)\. Hence \(R\) is incompatible with either of the other two properties under non\-identity embedding changes\.
Combining the three exclusions: no single scalar in the family satisfies any two properties simultaneously\. ∎
The quantifier structure of the theorem is parametric in the tolerancesεA,δA,δO,δR\\varepsilon\_\{A\},\\delta\_\{A\},\\delta\_\{O\},\\delta\_\{R\}rather than in specific numbers; the empirical table below instantiates the tolerances at the values used in Section[8](https://arxiv.org/html/2606.11222#S8)\.
### 6\.1Empirical support
Across four candidate single\-coordinate scalars \(SIL, breadth, integration, novelty\) and two composite scalars \(SminmaxS\_\{\\mathrm\{minmax\}\},SrankS\_\{\\mathrm\{rank\}\}\):
No scalar in any tested family achieves more than one corner\.
## 7Two Recommended Scalars
The trade\-off triangle implies that the framework should expose at least two scalars, each tuned for a different use case\.
### 7\.1SminmaxS\_\{\\mathrm\{minmax\}\}— for analytic work
X~mm\(x\)=x−minmax−min∈\[0,1\],Sminmax\(T\)=\(N~α⋅B~β⋅I~γ\)1/\(α\+β\+γ\)\.\\tilde\{X\}\_\{\\mathrm\{mm\}\}\(x\)=\\frac\{x\-\\min\}\{\\max\-\\min\}\\in\[0,1\],\\qquad S\_\{\\mathrm\{minmax\}\}\(T\)=\\bigl\(\\tilde\{N\}^\{\\alpha\}\\cdot\\tilde\{B\}^\{\\beta\}\\cdot\\tilde\{I\}^\{\\gamma\}\\bigr\)^\{1/\(\\alpha\+\\beta\+\\gamma\)\}\.
Use when bounded paraphrase drift and closed\-form composition matter \(theoretical analysis, summarization\-loss decomposition\)\.
### 7\.2SrankS\_\{\\mathrm\{rank\}\}— for ranking work
X~rk\(x;ε\)=ε\+\(1−ε\)⋅pct\_rank\(x;ref\)∈\(ε,1\],Srank\(T\)=\(N~α⋅B~β⋅I~γ\)1/\(α\+β\+γ\)\.\\tilde\{X\}\_\{\\mathrm\{rk\}\}\(x;\\varepsilon\)=\\varepsilon\+\(1\-\\varepsilon\)\\cdot\\mathrm\{pct\\\_rank\}\(x;\\mathrm\{ref\}\)\\in\(\\varepsilon,1\],\\qquad S\_\{\\mathrm\{rank\}\}\(T\)=\\bigl\(\\tilde\{N\}^\{\\alpha\}\\cdot\\tilde\{B\}^\{\\beta\}\\cdot\\tilde\{I\}^\{\\gamma\}\\bigr\)^\{1/\(\\alpha\+\\beta\+\\gamma\)\}\.
Use when ordinal robustness and freedom from small\-NNcollapse matter \(document ranking, leaderboards, comparing texts of different lengths\)\.
### 7\.3Common settings
\(α,β,γ\)=\(0\.5,3\.0,1\.0\),ε=0\.05,integration=I2\-NN,τ=0\.70\(natural prose\)\.\(\\alpha,\\beta,\\gamma\)=\(0\.5,\\ 3\.0,\\ 1\.0\),\\quad\\varepsilon=0\.05,\\quad\\text\{integration\}=I\_\{2\\text\{\-NN\}\},\\quad\\tau=0\.70\\text\{ \(natural prose\)\}\.
The choice of\(α,β,γ\)\(\\alpha,\\beta,\\gamma\)and 2\-NN integration was determined by grid search over720720configurations on the benchmark; the rank versus min\-max choice is structural, not tunable\.
## 8Empirical Validation
### 8\.1Benchmark questions
The empirical evaluation is organized around four questions, each targeting a separate desideratum identified by the framework’s design:
1. 1\.Redundancy control\.Does exact or near\-exact repetition leave the profile largely unchanged after deduplication?
2. 2\.Paraphrase stability\.Do semantically equivalent paraphrases remain near each other in profile space?
3. 3\.Idea multiplicity\.Do coherent multi\-idea passages exhibit greater breadth than single\-idea passages?
4. 4\.Coherence discrimination\.Do coherent multi\-part passages exhibit greater integration than unordered bags of facts with similar topical spread?
A fifth set ofrobustness checksprobes sensitivity to apparatus choices: segmentation granularity, deduplication thresholdτ\\tau, baseline corpus choice, normalization reference set, and embedding model\. The framework is calibrated rather than absolute, so robustness across these axes is essential to interpreting any reported number\.
### 8\.2Setup
Experiments usesentence\-transformers/all\-mpnet\-base\-v2\(768\-dim\) as the primary model, with cross\-model validation onall\-MiniLM\-L6\-v2andparaphrase\-MiniLM\-L6\-v2\(both 384\-dim\)\. The baseline corpus comprises 40 semantically neutral sentences\. Covariance inversion uses Ledoit–Wolf shrinkage \(estimated shrinkage≈0\.66\\approx 0\.66\)\.
The synthetic benchmark comprises 23 categories: generic filler, three single\-idea technical passages \(merge sort, photosynthesis, backpropagation\), paraphrase and triplication variants, five multi\-idea domain passages \(computer science, natural science, humanities, medicine, mathematics\), protein multi\-concept, two bags of unrelated facts \(5 and 7 sentences\), three Wikipedia\-style passages, a coherent ML pipeline passage, and five stress tests\.
For real\-text validation we use five Project Gutenberg novels:*Pride and Prejudice*,*A Tale of Two Cities*,*Moby Dick*,*Frankenstein*, and*The Adventures of Sherlock Holmes*\(combined∼3\.7\\sim 3\.7M characters\)\.
### 8\.3Stability
Triplication drift is≈10−7\\approx 10^\{\-7\}on all coordinates and onSminmaxS\_\{\\mathrm\{minmax\}\}, confirming that deduplication structurally eliminates exact\-repetition artifacts under min\-max normalization\.SrankS\_\{\\mathrm\{rank\}\}shows a small positive triplication drift \(∼7%\\sim 7\\%\) because exact triplication can cross rank\-normalization boundaries even when the coordinates are invariant — a structural consequence of rank normalization rather than a framework failure\. Paraphrase drift on a hand\-constructed pair \(merge sort and its paraphrase\) is∼6%\\sim 6\\%onSminmaxS\_\{\\mathrm\{minmax\}\}and∼21%\\sim 21\\%onSrankS\_\{\\mathrm\{rank\}\}as point estimates\. Under a sentence\-level bootstrap \(300 iterations, resampling sentences within each passage\), scalar drift onSrankS\_\{\\mathrm\{rank\}\}has mean0\.380\.38with 95% CI\[0\.085,0\.721\]\[0\.085,0\.721\]\. The point estimate substantially understates drift under within\-passage perturbation; paraphrase stability is conditional on the specific paraphrase chosen rather than a uniform property of the framework\.
### 8\.4Ordinal benchmark
On the 28\-check synthetic benchmark:
- •SrankS\_\{\\mathrm\{rank\}\}:25/28checks pass as point estimates\. Under a sentence\-level bootstrap \(300 iterations\), the mean pass rate is0\.640\.64with 95% CI\[0\.30,0\.93\]\[0\.30,0\.93\]\. Applying Benjamini–Hochberg correction across the 28 per\-check one\-sided binomial tests againstH0H\_\{0\}:p=0\.5p=0\.5,21 of 28 checks are significant atα=0\.05\\alpha=0\.05\. The 7 non\-significant checks cluster on comparisons involvingsingle\_backprop\(which embeds near the multi\-idea cluster\) and a subset ofsingle\_\*\>\>generic\_fillercomparisons whose bootstrap pass rates fall below0\.50\.5\.
- •SminmaxS\_\{\\mathrm\{minmax\}\}: 21/28\.
- •Single coordinates:77–21/2821/28depending on coordinate \(breadth\-alone tiesSminmaxS\_\{\\mathrm\{minmax\}\}at 21/28; novelty\-alone 13/28; integration\-alone 7/28\)\.
### 8\.5Baseline comparison
To contextualize the25/2825/28figure forSrankS\_\{\\mathrm\{rank\}\}, we run seven baselines through the same 28\-check protocol\.
Four observations are load\-bearing\. First,SrankS\_\{\\mathrm\{rank\}\}beats every baseline by at least 4 checks \(21/28 for BH\-corrected, 25/28 raw; closest baseline is breadth alone at 21/28\)\. Second, breadth alone is already competitive; the scalar’s lift over breadth is modest \(4 checks\) and should be understood as combining coordinates rather than as single\-coordinate dominance\. Third, unigram entropy at 20/28 is a stronger baseline than the earlier version of the paper implied; on lexical\-diversity\-like comparisons \(single\-idea vs\. generic\-filler\) it is nearly tied with breadth, and only loses on structural checks \(coherent\-vs\-bag, multi\-vs\-single\) where the geometric decomposition matters\. Fourth, BERTScore\-F—a widely\-adopted embedding\-based evaluation metric—lands at 19/28 when repurposed as a single\-passage novelty signal \(similarity to the neutral baseline corpus, inverted\)\. This is expected: BERTScore was designed for pairwise similarity between candidate and reference, not for single\-passage informativeness\. Our framework outperforms it cleanly \(\+6\+6checks\) on the ordinal discrimination task the benchmark tests\.
## 9Variational Characterization of Breadth
The breadth coordinateB\(T\)=Deff\(T\)⋅R\(T\)B\(T\)=D\_\{\\mathrm\{eff\}\}\(T\)\\cdot R\(T\)was introduced as a heuristic geometric measure: effective rank of the deduplicated centroid covariance, weighted by mean radial cosine distance\. This section shows thatBBadmits an*optimization\-theoretic*characterization as the log\-volume of a determinantal\-point\-process \(DPP\)\[[13](https://arxiv.org/html/2606.11222#bib.bib13)\]maximum\-a\-posteriori selection, with a provable structural decomposition and a tight empirical correspondence \(ρ=0\.985\\rho=0\.985on 507 natural\-text chapters\)\.
#### Setup\.
Fix unit\-normalized embeddingsX=\{x1,…,xn\}⊂Sd−1X=\\\{x\_\{1\},\\ldots,x\_\{n\}\\\}\\subset S^\{d\-1\}\(sentence embeddings areℓ2\\ell\_\{2\}\-normalized by construction in all standard sentence\-transformer models\)\. For each segmentii, define the*quality*
qi=exp\(‖xi−μ0‖Mσ\),σ=stdj\(‖xj−μ0‖M\),q\_\{i\}=\\exp\\\!\\left\(\\frac\{\\\|x\_\{i\}\-\\mu\_\{0\}\\\|\_\{M\}\}\{\\sigma\}\\right\),\\qquad\\sigma=\\mathrm\{std\}\_\{j\}\\\!\\left\(\\\|x\_\{j\}\-\\mu\_\{0\}\\\|\_\{M\}\\right\),where∥⋅∥M\\\|\\cdot\\\|\_\{M\}is the Ledoit–Wolf Mahalanobis norm of Section[2](https://arxiv.org/html/2606.11222#S2)\. Define then×nn\\times nquality\-weighted cosine kernel
Lij=qiqjmax\(0,⟨xi,xj⟩\)\.L\_\{ij\}=q\_\{i\}\\,q\_\{j\}\\,\\max\(0,\\langle x\_\{i\},x\_\{j\}\\rangle\)\.LLis symmetric and positive semi\-definite whenever the cosine\-similarity matrix restricted to the non\-negative cone is PSD, which holds when the embeddings lie in an orthant; for arbitrary embeddings the truncationmax\(0,⋅\)\\max\(0,\\cdot\)may introduce negative eigenvalues, handled by aεI\\varepsilon Iregularizer\.
The DPP\-MAP objective is
S∗\(T\)=argmax∅≠S⊆\[n\]logdetLS\.S^\{\*\}\(T\)=\\arg\\max\_\{\\,\\varnothing\\neq S\\subseteq\[n\]\}\\log\\det L\_\{S\}\.Greedy selection terminates when the marginal log\-gainlogdetLS∪\{i∗\}−logdetLS\\log\\det L\_\{S\\cup\\\{i^\{\*\}\\\}\}\-\\log\\det L\_\{S\}becomes non\-positive, and achieves the\(1−1/e\)\(1\-1/e\)approximation guarantee of log\-submodular maximization\.
###### Theorem 3\(DPP–breadth structural decomposition\)\.
LetGS=\[⟨xi,xj⟩\]i,j∈SG\_\{S\}=\[\\langle x\_\{i\},x\_\{j\}\\rangle\]\_\{i,j\\in S\}be the Gram matrix of cosines restricted toSS\. Then
logdetLS=2∑i∈Slogqi\+logdetGS\.\\log\\det L\_\{S\}\\;=\\;2\\sum\_\{i\\in S\}\\log q\_\{i\}\\;\+\\;\\log\\det G\_\{S\}\.Moreover, let\{λk\}k=1\|S\|\\\{\\lambda\_\{k\}\\\}\_\{k=1\}^\{\|S\|\}denote the eigenvalues ofGSG\_\{S\}, letDeff\(GS\)=exp\(H\(λ^\)\)D\_\{\\mathrm\{eff\}\}\(G\_\{S\}\)=\\exp\(H\(\\hat\{\\lambda\}\)\)withλ^k=λk/∑jλj\\hat\{\\lambda\}\_\{k\}=\\lambda\_\{k\}/\\sum\_\{j\}\\lambda\_\{j\}, and letR\(S\)=1−\|S\|−1∑i∈S⟨xi,μS⟩/‖μS‖R\(S\)=1\-\|S\|^\{\-1\}\\sum\_\{i\\in S\}\\langle x\_\{i\},\\mu\_\{S\}\\rangle/\\\|\\mu\_\{S\}\\\|whereμS=\|S\|−1∑i∈Sxi\\mu\_\{S\}=\|S\|^\{\-1\}\\sum\_\{i\\in S\}x\_\{i\}\. Under the*near\-isotropic*regime in which the pairwise cosines satisfy\|⟨xi,xj⟩−c\|≤ε\|\\langle x\_\{i\},x\_\{j\}\\rangle\-c\|\\leq\\varepsilonfor somec∈\[0,1\)c\\in\[0,1\)and alli≠j∈S∗i\\neq j\\in S^\{\*\}, the following hold:
1. \(i\)logdetGS=∑klogλk\\log\\det G\_\{S\}=\\sum\_\{k\}\\log\\lambda\_\{k\}, and the spectrum ofGSG\_\{S\}concentrates around\(1−c\)\(1\-c\)with one eigenvalue at1\+\(\|S\|−1\)c1\+\(\|S\|\-1\)c\.
2. \(ii\)Deff\(GS\)=\|S\|⋅\(1\+O\(ε\)\+O\(\|S\|c2\)\)D\_\{\\mathrm\{eff\}\}\(G\_\{S\}\)=\|S\|\\cdot\\big\(1\+O\(\\varepsilon\)\+O\(\|S\|c^\{2\}\)\\big\)\.
3. \(iii\)R\(S\)=1−\(1\+\(\|S\|−1\)c\)/\|S\|⋅\|S\|−1\+O\(ε\)R\(S\)=1\-\\sqrt\{\(1\+\(\|S\|\-1\)c\)/\|S\|\}\\cdot\|S\|^\{\-1\}\+O\(\\varepsilon\), monotone increasing in\|S\|\|S\|and decreasing incc\.
4. \(iv\)SettingBS:=Deff\(GS\)⋅R\(S\)B\_\{S\}:=D\_\{\\mathrm\{eff\}\}\(G\_\{S\}\)\\cdot R\(S\), there exists a universal functionh:ℝ≥02→ℝh:\\mathbb\{R\}\_\{\\geq 0\}^\{2\}\\to\\mathbb\{R\},h\(\|S\|,c\)h\(\|S\|,c\), such thatlogdetGS=h\(\|S\|,c\)⋅\(1\+O\(ε\)\)\\log\\det G\_\{S\}=h\(\|S\|,c\)\\cdot\(1\+O\(\\varepsilon\)\)andBS=h′\(\|S\|,c\)⋅\(1\+O\(ε\)\)B\_\{S\}=h^\{\\prime\}\(\|S\|,c\)\\cdot\(1\+O\(\\varepsilon\)\), with bothh,h′h,h^\{\\prime\}strictly increasing in\|S\|\|S\|and strictly decreasing incc\.
###### Proof sketch\.
The first identity follows from the multiplicative property of determinants:
logdetLS=logdet\(diag\(qS\)GSdiag\(qS\)\)=2∑i∈Slogqi\+logdetGS\.\\log\\det L\_\{S\}\\;=\\;\\log\\det\\\!\\big\(\\mathrm\{diag\}\(q\_\{S\}\)\\,G\_\{S\}\\,\\mathrm\{diag\}\(q\_\{S\}\)\\big\)\\;=\\;2\\sum\_\{i\\in S\}\\log q\_\{i\}\+\\log\\det G\_\{S\}\.Under the uniform\-correlation modelGS=\(1−c\)I\+cJG\_\{S\}=\(1\-c\)\\,I\+c\\,J\(whereJJis all\-ones\), the spectrum is one eigenvalue equal to1\+\(\|S\|−1\)c1\+\(\|S\|\-1\)cand\|S\|−1\|S\|\-1eigenvalues equal to1−c1\-c\. Theε\\varepsilonperturbation is controlled by Weyl’s inequality\. Parts \(i\)–\(iii\) follow by direct substitution\.
For \(iv\), bothlogdetGS\\log\\det G\_\{S\}andBSB\_\{S\}are smooth functions of\(\|S\|,c\)\(\|S\|,c\)through the spectrum\. In particular,
logdetGS=log\(1\+\(\|S\|−1\)c\)\+\(\|S\|−1\)log\(1−c\),\\log\\det G\_\{S\}=\\log\\\!\\big\(1\+\(\|S\|\-1\)c\\big\)\+\(\|S\|\-1\)\\log\(1\-c\),which is strictly decreasing inccand strictly increasing in\|S\|\|S\|forc∈\[0,1\)c\\in\[0,1\)\.Deff⋅RD\_\{\\mathrm\{eff\}\}\\cdot Rinherits the same monotonicities from \(ii\) and \(iii\)\. Both are therefore co\-monotone in\(\|S\|,c\)\(\|S\|,c\); the empiricalρ=0\.985\\rho=0\.985\(n=507n=507chapters\) realizes this co\-monotonicity under the approximate near\-isotropy that natural\-text embeddings satisfy\. ∎
###### Corollary 4\(Parameter\-free recovery of breadth\)\.
Under the conditions of Theorem[3](https://arxiv.org/html/2606.11222#Thmtheorem3), the DPP\-MAP log\-volume
V\(T\):=logdetLS∗\(T\)−2∑i∈S∗\(T\)logqiV\(T\):=\\log\\det L\_\{S^\{\*\}\(T\)\}\-2\\sum\_\{i\\in S^\{\*\}\(T\)\}\\log q\_\{i\}is a parameter\-free variational quantity \(noλ\\lambda, noα,β,γ\\alpha,\\beta,\\gamma\) that recovers the breadth coordinate up to a monotonic transformation andO\(ε\)O\(\\varepsilon\)error, whereε\\varepsiloncontrols the near\-isotropy deviation of the segment set\.
#### Empirical verification\.
On 507 chapters across 5 Project Gutenberg novels, Spearmanρ\(V,B\)=0\.985\\rho\(V,B\)=0\.985\. Over the 16\-item synthetic benchmark, greedy DPP is ILP\-optimal on all items \(exhaustive enumeration up to\|S\|≤16\|S\|\\leq 16\)\. Permutation drift ofVVis0\.00%0\.00\\%on a representative chapter \(P&P, Chapter 1\); triplication drift is0\.00%0\.00\\%\(the repeated segments contribute linearly\-dependent rows, collapsing the determinant of the expanded set to the original\)\. The selection size\|S∗\|\|S^\{\*\}\|scales with chapter length at Spearmanρ\(\|S∗\|,nsent\)=0\.977\\rho\(\|S^\{\*\}\|,n\_\{\\mathrm\{sent\}\}\)=0\.977, showing that the DPP recovery does not saturate\.
#### Interpretation\.
The theorem establishes thatBBandVVare not two independent semantic measures but two characterizations of the same geometric object: the log\-volume of a diversity\-maximizing, quality\-weighted subset of the segment set\.BBis the heuristic decomposition \(effective rank times radial spread\);VVis the optimization form \(log\-det of a DPP kernel\)\. Their empirical equivalence promotes breadth from a heuristic recipe to a derived quantity with a variational characterization\.
## 10Discussion
The framework’s central claim is that semantic information in text is a representation\-indexed structured profile, not a universal scalar\. Three observations support this position\.
First, the SIL theorem \(§[3](https://arxiv.org/html/2606.11222#S3)\) shows that even when uniqueness is achievable, it is only within a representational frame\. There is no representation\-free analog of Shannon’s entropy theorem because there is no representation\-free notion of meaning equivalence\.
Second, the no\-go theorem \(§[6](https://arxiv.org/html/2606.11222#S6)\) shows that within a frame, no scalar summary serves all use cases simultaneously\. The three corners of the trade\-off triangle correspond to fundamentally different operational requirements \(analysis vs\. ranking vs\. cross\-frame comparison\), each forcing a structural choice that violates the others\.
Third, empirical validation shows that the*coordinates*of the profile are substantially more stable across embeddings \(ρ∈\[0\.92,0\.98\]\\rho\\in\[0\.92,0\.98\]\) than any scalar derived from them \(ρ∈\[0\.79,0\.84\]\\rho\\in\[0\.79,0\.84\]\)\. The scalar’s instability is not a measurement error but a direct consequence of the no\-go theorem\.
The profile\-first stance carries three concrete advantages over a scalar\-first one\.
#### Assumptions are explicit\.
The five components of the measurement apparatus \(§[2\.1](https://arxiv.org/html/2606.11222#S2.SS1)\) — embeddingEE, segmentation rule, baseline\(μ0,Σ0\)\(\\mu\_\{0\},\\Sigma\_\{0\}\), thresholdτ\\tau, and normalization reference — are visible parts of the pipeline\. Any claim made by the framework can be re\-examined under variations in these components, and disagreements between practitioners are localized to specific apparatus choices rather than to the framework as a whole\.
#### Diagnostics are interpretable\.
Novelty, breadth, and integration move independently and correspond to distinct semantic intuitions: departure from generic discourse, diversity of distinct ideas, and connectedness among them\. A scalar summary collapses these into one number; the profile preserves them\. In summarization analysis \(§[D\.1](https://arxiv.org/html/2606.11222#A4.SS1)\) the per\-coordinate signature distinguishes faithful, partial, lossy, and off\-topic summaries that any scalar conflates\.
#### Aggregation matches the use case\.
Different applications privilege different criteria\. Retrieval and ranking want ordinal robustness; theoretical analysis wants closed\-form composition; cross\-corpus comparison wants reference\-free comparability\. Theorem[2](https://arxiv.org/html/2606.11222#Thmtheorem2)establishes that no scalar can serve all three\. The framework therefore exposes the profile as the primary object and offers two scalars \(§[7](https://arxiv.org/html/2606.11222#S7)\) tuned to distinct corners of the trade\-off triangle, leaving the choice to the application\.
The progression from the axiomatic SIL to the empirical profile and the trade\-off triangle clarifies the scope of the theory\. The profile is the theoretical object; the two scalars are practical conveniences for distinct use cases; the trade\-off triangle explains why a universal scalar cannot exist\.
## 11Limitations and Future Work
1. 1\.Benchmark scale\.The synthetic benchmark, while comprising2323categories and supplemented by55Project Gutenberg novels, remains hand\-constructed\. Definitive validation requires evaluation against human judgments of semantic richness, idea multiplicity, coherence, and informativeness\.
2. 2\.Segmentation sensitivity\.Breadth can swing substantially between sentence\-level and fixed\-size chunking, though ordinal rankings are more stable than absolute values\. Robustness to segmentation should be evaluated rather than assumed\.
3. 3\.Baseline dependence\.The neutral baseline corpus influences novelty through bothμ0\\mu\_\{0\}andΣ0\\Sigma\_\{0\}\. Poor baseline choice can distort what counts as semantically displaced; domain\-specific baselines may be needed for specialized corpora\.
4. 4\.Reference\-set dependence\.Both recommended scalars depend on a reference set for normalization, making them calibrated rather than absolute\. Values from different reference sets are not directly comparable\.
5. 5\.Adaptiveτ\\tau\.The current formulation treatsτ\\tauas a fixed apparatus parameter\. A principled approach might deriveτ\\taufrom properties of the embedding space \(e\.g\., typical within\-topic variance or baseline pairwise\-similarity statistics\), replacing the empirically\-set default \(0\.700\.70\) with values calibrated to the embedding model and domain\.
6. 6\.Variational characterization\.Theorem[3](https://arxiv.org/html/2606.11222#Thmtheorem3)\(iv\) is proved under a near\-isotropic regime\. Tightening it beyond this regime, and extending the DPP–breadth correspondence to text types \(poetry, heavily redundant prose\) where the isotropy assumption fails, is open\.
7. 7\.Embedding dependence\.All results depend on the underlying embedding model\. Cross\-model rank correlations of0\.920\.92–0\.980\.98on the coordinates suggest that the framework captures genuine geometric properties rather than model\-specific artifacts, but this claim should be tested with future embedding architectures\.
8. 8\.Scope of measurement\.The framework measures geometric properties associated with semantic structure; it does not directly measure truth, usefulness, rhetorical quality, or task success\. Downstream\-task validation \(summarization faithfulness, retrieval quality\) is the natural next step\.
## 12Conclusion
A single raw geometric score is not an adequate representation of semantic information in text\. An axiomatic derivation yields the unique frame\-conditional measureIE\(T\)=‖μT−μ0‖⋅rank\(CT\)I\_\{E\}\(T\)=\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|\\cdot\\mathrm\{rank\}\(C\_\{T\}\), but this scalar is empirically inadequate\. The revised framework models semantic information as a structured geometric profile with three coordinates — compressed Mahalanobis novelty, effective\-rank\-weighted breadth on deduplicated centroids, and second\-nearest\-neighbor integration — and a natural minimal unit, the semantic quantum, with explicit measurement resolutionτ\\tau\.
A no\-go theorem shows that no scalar summary built from this profile can simultaneously achieve analytic stability, ordinal robustness, and cross\-representation comparability\. Two practical scalars,SminmaxS\_\{\\mathrm\{minmax\}\}andSrankS\_\{\\mathrm\{rank\}\}, occupy distinct corners of this trade\-off triangle\. Validation across2323synthetic categories,55Project Gutenberg novels, and33embedding models confirms the trade\-off and identifiesSrankS\_\{\\mathrm\{rank\}\}with weights\(0\.5,3\.0,1\.0\)\(0\.5,3\.0,1\.0\),22\-NN integration, andτ=0\.70\\tau=0\.70as passing2525of2828ordinal checks as point estimates \(2121of2828after multiplicity correction; bootstrap pass rate0\.640\.64with 95% CI\[0\.30,0\.93\]\[0\.30,0\.93\]\), the best configuration in our sweep and ahead of seven baselines\. A separate finding connects the profile to an optimization\-theoretic object: on 507 Gutenberg chapters, theBBcoordinate is empirically the log\-volume of the DPP\-optimal segment subset \(Spearmanρ=0\.985\\rho=0\.985\), providing a parameter\-free variational characterization of breadth\.
The framework’s central conclusion is that semantic information is best understood not as a single number but as a representation\-indexed family of profiles, with scalar summaries chosen by use case\. The trade\-off triangle is a structural property of the profile\-to\-scalar map, not a defect of any particular construction\.
## Appendix AProof of Theorem[1](https://arxiv.org/html/2606.11222#Thmtheorem1)
AssumeI\(T\)=F\(μT,CT\)I\(T\)=F\(\\mu\_\{T\},C\_\{T\}\)\.
*Step 1\.*By axiom 3 \(novelty monotonicity\) and axiom 5 \(orthogonal invariance\), dependence onμT\\mu\_\{T\}reduces to dependence onS\(T\)=‖μT−μ0‖S\(T\)=\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|\.
*Step 2\.*By axiom 5, dependence onCTC\_\{T\}reduces to dependence on the spectrum ofCTC\_\{T\}\.
*Step 3\.*By axiom 2 \(redundancy non\-increase\), exact replication leaves the rank invariant but rescales eigenvalues; forIIto be non\-increasing under such replication while remaining a function of the spectrum alone, dependence on eigenvalue magnitudes must be eliminated, leavingD\(T\)=rank\(CT\)D\(T\)=\\mathrm\{rank\}\(C\_\{T\}\)\.
*Step 4\.*By axiom 4 \(idea additivity\) applied to texts in orthogonal subspaces with constant shift,G\(D1\+D2\)=G\(D1\)\+G\(D2\)G\(D\_\{1\}\+D\_\{2\}\)=G\(D\_\{1\}\)\+G\(D\_\{2\}\)\. With axiom 6 \(continuity\), the continuous solutions to this Cauchy equation are linear:G\(D\)=aDG\(D\)=aDfor someaathat may depend onSS\.
*Step 5\.*Applying axiom 4 again to the shift variable \(texts with disjoint orthogonal complements but additive displacement contributions\),a\(S\)a\(S\)satisfies a similar Cauchy equation:a\(S1\+S2\)=a\(S1\)\+a\(S2\)a\(S\_\{1\}\+S\_\{2\}\)=a\(S\_\{1\}\)\+a\(S\_\{2\}\)\. By continuity,a\(S\)=kSa\(S\)=kS\.
*Step 6\.*Assembly:I\(T\)=k⋅S\(T\)⋅D\(T\)=k⋅‖μT−μ0‖⋅rank\(CT\)I\(T\)=k\\cdot S\(T\)\\cdot D\(T\)=k\\cdot\\\|\\mu\_\{T\}\-\\mu\_\{0\}\\\|\\cdot\\mathrm\{rank\}\(C\_\{T\}\)\. The constantk\>0k\>0is the unit of measurement\. ∎
## Appendix BSemantic Quantum: Additional Properties
### B\.1Quantum density
The quantum countmτm\_\{\\tau\}measures absolute structural richness; quantum density measures informativeness per segment\.
###### Definition 2\(Quantum density and saturation\)\.
For a text withkksegments andmτm\_\{\\tau\}quanta, the*quantum density*is
ρτ\(T\)=mτ\(T\)k,ρτ\(T\)∈\(0,1\]\.\\rho\_\{\\tau\}\(T\)=\\frac\{m\_\{\\tau\}\(T\)\}\{k\},\\qquad\\rho\_\{\\tau\}\(T\)\\in\(0,1\]\.A text is*τ\\tau\-saturated*whenρτ=1\\rho\_\{\\tau\}=1, i\.e\. no two segments are within cosine similarityτ\\tauof each other\. A text is*redundant*whenρτ<1\\rho\_\{\\tau\}<1\.
Density distinguishes informativeness per segment from absolute informativeness: a long redundant text and a short dense text can have similarmτm\_\{\\tau\}but very differentρτ\\rho\_\{\\tau\}\. Empirically,ρ0\.70\\rho\_\{0\.70\}on natural prose ranges from≈0\.62\\approx 0\.62\(dialogue\-heavy chapters with stylistic repetition\) to1\.001\.00\(terse expository writing\)\.
### B\.2Quantum spectrum
The quanta of a text carry geometric structure beyond their count: the pairwise distances among quanta form a*spectrum*that captures finer organization\.
###### Definition 3\(Quantum spectrum\)\.
The*quantum spectrum*ofTTat resolutionτ\\tauis the multiset
Στ\(T\)=\{dij:dij=1−cos\(ci,cj\),1≤i<j≤mτ\}\.\\Sigma\_\{\\tau\}\(T\)=\\bigl\\\{d\_\{ij\}:d\_\{ij\}=1\-\\cos\(c\_\{i\},c\_\{j\}\),\\ 1\\leq i<j\\leq m\_\{\\tau\}\\bigr\\\}\.
Three statistics ofΣτ\\Sigma\_\{\\tau\}map directly to the profile coordinates and to natural extensions:
- •minΣτ≥1−τ\\min\\Sigma\_\{\\tau\}\\geq 1\-\\tauby construction \(the deduplication invariant\)\.
- •meanΣτ\\mathrm\{mean\}\\,\\Sigma\_\{\\tau\}is closely related to breadth’s radial\-spread component: a text whose quanta are uniformly distant has high breadth\.
- •maxΣτ\\mathrm\{max\}\\,\\Sigma\_\{\\tau\}measures the*semantic diameter*of the text — the largest distance any two quanta achieve, marking the extent of the text’s coverage\.
- •The*spectral gap*maxΣτ−minΣτ\\max\\Sigma\_\{\\tau\}\-\\min\\Sigma\_\{\\tau\}distinguishes texts whose quanta cluster tightly with one outlier \(large gap\) from texts whose quanta are uniformly spread \(small gap\)\.
The full spectrum is a richer descriptor than any single coordinate; we treat the profile\(N,B,I\)\(N,B,I\)as a low\-dimensional summary ofΣτ\\Sigma\_\{\\tau\}together with the baseline displacement, suitable for ordinal comparison\.
### B\.3Composition behavior
Quanta do not compose additively under text concatenation\. For any two textsT1,T2T\_\{1\},T\_\{2\}:
max\(mτ\(T1\),mτ\(T2\)\)≤mτ\(T1⊕T2\)≤mτ\(T1\)\+mτ\(T2\)\.\\max\\bigl\(m\_\{\\tau\}\(T\_\{1\}\),\\ m\_\{\\tau\}\(T\_\{2\}\)\\bigr\)\\;\\leq\\;m\_\{\\tau\}\(T\_\{1\}\\oplus T\_\{2\}\)\\;\\leq\\;m\_\{\\tau\}\(T\_\{1\}\)\+m\_\{\\tau\}\(T\_\{2\}\)\.The lower bound is attained whenT2T\_\{2\}is semantically contained inT1T\_\{1\}\(every quantum ofT2T\_\{2\}falls within1−τ1\-\\tauof some quantum ofT1T\_\{1\}\); the upper bound is attained whenT1,T2T\_\{1\},T\_\{2\}are mutually disjoint at resolutionτ\\tau\. The gapmτ\(T1\)\+mτ\(T2\)−mτ\(T1⊕T2\)m\_\{\\tau\}\(T\_\{1\}\)\+m\_\{\\tau\}\(T\_\{2\}\)\-m\_\{\\tau\}\(T\_\{1\}\\oplus T\_\{2\}\)counts shared semantic units between the two texts\.
### B\.4Resolution sweep behavior
The quantum countmτm\_\{\\tau\}is a piecewise\-constant, monotone non\-increasing function ofτ∈\(0,1\]\\tau\\in\(0,1\]:
τ→0⟹mτ→1,τ→1⟹mτ→k\.\\tau\\to 0\\implies m\_\{\\tau\}\\to 1,\\qquad\\tau\\to 1\\implies m\_\{\\tau\}\\to k\.Specifically,mτm\_\{\\tau\}jumps downward at each value ofτ\\tauthat crosses a merge event in the agglomerative clustering\. The complete*τ\\tau\-curve*τ↦mτ\(T\)\\tau\\mapsto m\_\{\\tau\}\(T\)is therefore a discrete summary of the text’s hierarchical semantic structure: the heights of the steps record cluster cardinalities, and the locations of the jumps record inter\-cluster distances\. Two texts with identicalm0\.70m\_\{0\.70\}but differentτ\\tau\-curves have measurably different organization\.
This curve also addresses a methodological criticism of any singleτ\\tauchoice: rather than committing to one resolution, a text can be characterized by the entire curve, with the recommendedτ=0\.70\\tau=0\.70understood as a single useful sample\.
### B\.5Summary
The semantic quantum is the discrete primitive of the framework\. The continuous coordinates\(N,B,I\)\(N,B,I\)are convenient real\-valued summaries; the discrete countmτm\_\{\\tau\}, densityρτ\\rho\_\{\\tau\}, spectrumΣτ\\Sigma\_\{\\tau\}, andτ\\tau\-curve together form a richer picture of a text’s semantic structure than any scalar can capture\. The thresholdτ\\tauis the apparatus parameter that fixes the resolution scale; choices ofτ\\taucorrespond to choices of measurement instrument, and theτ\\tau\-curve makes this dependence transparent\.
## Appendix CDetailed Empirical Validation
### C\.1Cross\-model robustness
Coordinate\-level Spearmanρ\\rhoacross the three embedding models lies in\[0\.92,0\.98\]\[0\.92,0\.98\]\. Scalar\-levelρ\\rhois\[0\.79,0\.84\]\[0\.79,0\.84\]— substantially less stable than the coordinates due to normalization sensitivity\. This empirical asymmetry directly supports the central claim that the profile is more fundamental than any scalar derived from it\.
### C\.2Project Gutenberg validation
Under joint normalization with the synthetic benchmark, four of the five novels rank above all 23 synthetic categories:*Moby Dick*\(0\.660\.66\)\>\>*Frankenstein*\(0\.600\.60\)\>\>*Pride and Prejudice*\(0\.580\.58\)\>\>*Tale of Two Cities*\(0\.540\.54\)\>\>top synthetic itembag\_7\(0\.050\.05\)\.*The Adventures of Sherlock Holmes*\(0\.080\.08\) clusters with synthetic items rather than with the other novels — a framework\-detected consequence of its structure \(a collection of twelve loosely\-linked cases rather than a continuous narrative\): it exhibits the highest breadth and lowest integration of any book, and breadth\-dominant geometric composition drives the scalar down when integration is the minimum of the book set\. This is a feature, not a failure: the framework correctly identifies Sherlock Holmes as a different kind of object than a novel\. The remaining four novels demonstrate that the framework scales to long natural text without saturation\.
Chapter\-level analysis on*Pride and Prejudice*produces a discriminative scalar trajectory across chapters; the corresponding analysis on*Sherlock Holmes*required a hard\-coded story list because the title format \(e\.g\.,A SCANDAL IN BOHEMIA\) does not match standard chapter regexes\.
### C\.3τ\\tau\-sweep on natural prose
Mean deduplication ratio acrossτ\\tauon book chapters:
Distinct sentences in natural prose almost never reach cosine similarity≥0\.90\\geq 0\.90, so a strictτ=0\.90\\tau=0\.90sits above the natural\-text similarity ceiling and deduplication essentially never triggers\. We therefore recommendτ=0\.70\\tau=0\.70for natural prose, withτ=0\.90\\tau=0\.90reserved for adversarial near\-duplicate stress tests where the cosine similarity of constructed near\-paraphrases is expected to be high\.
A per\-item kneedle\-criterion analysis on 16 benchmark items finds the quanta\-count vs\.τ\\taucurve’s point of maximum curvature atτ∈\[0\.55,0\.75\]\\tau\\in\[0\.55,0\.75\]\(median0\.750\.75, mean0\.700\.70\)\. Single\-idea passages preferτ≈0\.60\\tau\\approx 0\.60; multi\-idea, wiki, and bag\-style passages preferτ≈0\.75\\tau\\approx 0\.75\. The primary model’s mean baseline\-pair cosine \(an anisotropy estimate\) is0\.140\.14; a naive heuristicτ=a\+12\(1−a\)\\tau=a\+\\tfrac\{1\}\{2\}\(1\-a\)givesτ≈0\.57\\tau\\approx 0\.57, which under\-shoots the empirical knee\. The fixedτ=0\.70\\tau=0\.70recommendation is a compromise rather than a universal optimum; per\-item knee detection is available where item\-level tuning is acceptable\.
### C\.4External\-domain coverage
Beyond the synthetic benchmark and the Gutenberg novels, we profile four external corpora to test whether the coordinates generalize beyond English prose\.
The coordinates separate genres in interpretable directions\. Legal text \(EUR\-Lex\) has the highest breadth and the most aggressive dedup ratio \(boilerplate enumeration\), with arXiv showing moderate deduplication \(∼15%\\sim 15\\%\) driven by recurring scientific phrasing\. Poetry has a narrow\-and\-coherent signature \(low breadth, high integration\)\. Dialogue shows topic\-hopping \(high breadth, low integration\)\. arXiv abstracts are tightly coherent \(integration≈0\.5\\approx 0\.5\) with moderate breadth, as expected for technical exposition on a single topic\. Both arXiv and EUR\-Lex required a permissive sentence splitter accepting any non\-whitespace character after punctuation, because both corpora are preprocessed to lowercase and the default splitter’s capital\-letter look\-ahead fails on them; this is a segmentation issue, not a framework issue\.
External similarity benchmarks give Spearmanρ=0\.89\\rho=0\.89on STS\-B \(GLUE validation split,n=1500n=1500\) andρ=0\.78\\rho=0\.78on SICK \(mteb/sickr\-ststest split,n=9927n=9927\)\. These are correlations between human similarity scores and raw cosine distance on a single\-sentence\-pair basis, and reflect properties of the underlying embedding rather than of the profile; single\-sentence items degenerate to cosine under our segmentation\.
## Appendix DSummarization Information Loss: Additional Analysis
### D\.1Summarization Information Loss
For sourceTTand summaryT′T^\{\\prime\}within a fixed reference, define
ΔN=N\(T\)−N\(T′\),ΔB=B\(T\)−B\(T′\),ΔI=I\(T\)−I\(T′\),\\Delta N=N\(T\)\-N\(T^\{\\prime\}\),\\quad\\Delta B=B\(T\)\-B\(T^\{\\prime\}\),\\quad\\Delta I=I\(T\)\-I\(T^\{\\prime\}\),ΔS=S\(T\)−S\(T′\),ΔSrel=ΔS/S\(T\),\\Delta S=S\(T\)\-S\(T^\{\\prime\}\),\\quad\\Delta S\_\{\\mathrm\{rel\}\}=\\Delta S/S\(T\),cosdir=cos\(μT,μT′\),Mdrift=‖μT−μT′‖Σ0−1,ΔQ=m\(T\)−m\(T′\)\.\\cos\_\{\\mathrm\{dir\}\}=\\cos\(\\mu\_\{T\},\\mu\_\{T^\{\\prime\}\}\),\\quad M\_\{\\mathrm\{drift\}\}=\\\|\\mu\_\{T\}\-\\mu\_\{T^\{\\prime\}\}\\\|\_\{\\Sigma\_\{0\}^\{\-1\}\},\\quad\\Delta Q=m\(T\)\-m\(T^\{\\prime\}\)\.The per\-coordinate signature distinguishes faithful \(ΔS≈0\\Delta S\\approx 0,cosdir\>0\.9\\cos\_\{\\mathrm\{dir\}\}\>0\.9\), partial \(ΔS≈0\.5\\Delta S\\approx 0\.5,cosdir\>0\.8\\cos\_\{\\mathrm\{dir\}\}\>0\.8\), lossy \(ΔS≈1\\Delta S\\approx 1,cosdir\>0\.4\\cos\_\{\\mathrm\{dir\}\}\>0\.4\), and off\-topic \(ΔS≈1\\Delta S\\approx 1,cosdir<0\\cos\_\{\\mathrm\{dir\}\}<0\) summaries on synthetic labels\. Crucially,ΔS\\Delta Srequires fixed\-reference normalization; pair\-local min\-max degenerates\.
#### Human\-rated validation\.
On SummEval \(100 CNN/DailyMail source–summary pairs with 4 human\-rated axes\), Spearman correlations between the measures above and annotator\-averaged scores are
Directional fidelitycosdir\\cos\_\{\\mathrm\{dir\}\}is the dominant predictor of all four human axes; the per\-coordinate deltas correlate weakly or not at all, with a modestΔI\\Delta I–relevance signal as the only exception\. The per\-coordinate attrition signature validated on synthetic faithful/lossy/off\-topic labels does not transfer cleanly to human expert ratings on machine\-generated summaries\. For summarization diagnostics against human judgment,cosdir\\cos\_\{\\mathrm\{dir\}\}should be taken as the primary signal; per\-coordinate deltas contribute marginally at best\.
## Appendix EDownstream: Extractive Summarization Details
### E\.1Downstream: extractive summarization
To test whether the variational form has practical utility beyond its theoretical link to breadth, we evaluate DPP\-MAP as an extractive summarizer on XSum\[[10](https://arxiv.org/html/2606.11222#bib.bib10)\]\. For each article, select sentences by greedy log\-det maximization and score the concatenation against the human reference using ROUGE\-F\[[11](https://arxiv.org/html/2606.11222#bib.bib11)\]\. Baselines: Lead\-3 \(first three sentences\); MMR\[[12](https://arxiv.org/html/2606.11222#bib.bib12)\]withλ=0\.5\\lambda=0\.5; centroid selection \(closest\-to\-mean\); uniform random selection of the same size\. Sample: 186 test articles\.
#### Variational\-size selection\.
When DPP selects its own size by the greedy stopping rule \(meank=7\.4k=7\.4\), with each content\-aware baseline matched to the samekkper document:
Paired Wilcoxon of DPP vs\. MMR: ROUGE\-1\+0\.0054\+0\.0054\(p=0\.05p=0\.05\), ROUGE\-2−0\.0003\-0\.0003\(p=0\.95p=0\.95\),ROUGE\-L\+0\.0063\+0\.0063\(p=0\.005p=0\.005\)\. DPP significantly outperforms MMR on sequence\-level overlap without any diversity hyperparameter\.
#### Fixedk=3k=3\.
For apples\-to\-apples comparison with Lead\-3:
Paired Wilcoxon of DPP vs\. MMR atk=3k=3: allp\>0\.25p\>0\.25\(statistically tied\)\. DPP significantly beats random on ROUGE\-L \(\+0\.008\+0\.008,p=0\.024p=0\.024\)\. All content\-aware methods lose to Lead\-3 by∼0\.8\\sim 0\.8ROUGE\-L points, consistent with the well\-documented lead bias in news datasets\[[10](https://arxiv.org/html/2606.11222#bib.bib10)\]\.
#### Interpretation\.
DPP’s empirical advantage over MMR*appears at the variationally\-selected size*, not at an arbitrary fixedkk\. When forced tok=3k=3both methods collapse to statistical parity; at the self\-determined size they diverge in DPP’s favor\. This indicates that the parameter\-free size selection is itself part of the DPP form’s empirical contribution—not an incidental convenience—and is consistent with Theorem[3](https://arxiv.org/html/2606.11222#Thmtheorem3)\(iv\): the log\-det objective encodes a balance between\|S\|\|S\|and pairwise similarityccwhose optimum cannot be reconstructed by fixing\|S\|\|S\|externally\.
## References
- \[1\]Y\. Bar\-Hillel and R\. Carnap\. An outline of a theory of semantic information\. MIT RLE Technical Report 247, 1952\.
- \[2\]J\. Barwise and J\. Seligman\.*Information Flow*\. Cambridge University Press, 1997\.
- \[3\]F\. I\. Dretske\.*Knowledge and the Flow of Information*\. MIT Press, 1981\.
- \[4\]L\. Floridi\.*The Philosophy of Information*\. Oxford University Press, 2011\.
- \[5\]O\. Ledoit and M\. Wolf\. A well\-conditioned estimator for large\-dimensional covariance matrices\.*Journal of Multivariate Analysis*, 88\(2\):365–411, 2004\.
- \[6\]T\. Mikolov et al\. Distributed representations of words and phrases and their compositionality\.*NeurIPS*, 2013\.
- \[7\]N\. Reimers and I\. Gurevych\. Sentence\-BERT: Sentence embeddings using Siamese BERT\-networks\.*EMNLP\-IJCNLP*, 2019\.
- \[8\]O\. Roy and M\. Vetterli\. The effective rank: A measure of effective dimensionality\.*EUSIPCO*, 2007\.
- \[9\]C\. E\. Shannon\. A mathematical theory of communication\.*Bell System Technical Journal*, 27\(3\):379–423, 1948\.
- \[10\]S\. Narayan, S\. B\. Cohen, and M\. Lapata\. Don’t give me the details, just the summary\! Topic\-aware convolutional neural networks for extreme summarization\.*EMNLP*, 2018\.
- \[11\]C\.\-Y\. Lin\. ROUGE: A package for automatic evaluation of summaries\.*ACL Workshop on Text Summarization Branches Out*, 2004\.
- \[12\]J\. Carbonell and J\. Goldstein\. The use of MMR, diversity\-based reranking for reordering documents and producing summaries\.*SIGIR*, 1998\.
- \[13\]A\. Kulesza and B\. Taskar\. Determinantal point processes for machine learning\.*Foundations and Trends in Machine Learning*, 5\(2–3\):123–286, 2012\.
- \[14\]T\. Zhang, V\. Kishore, F\. Wu, K\. Q\. Weinberger, and Y\. Artzi\. BERTScore: Evaluating text generation with BERT\.*ICLR*, 2020\.Similar Articles
Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models
This paper compares the geometric structures induced by deep learning vector embeddings (CamemBERT) and lexical co-occurrence graph models on the French 'Great National Debate' corpus, finding similar local topology but distinct global organization, highlighting complementarity between the two approaches.
Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence
This paper introduces a post-training framework that leverages 3D priors from SAM3D to improve semantic correspondence in 2D foundation features, addressing issues like left-right confusion and repeated parts. The method uses instance-specific 3D reconstruction without pose annotations or spherical geometry shortcuts.
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
This paper investigates how semantic information is distributed across textual tokens in text-to-image models, finding that information concentration and cross-item interactions significantly affect image generation alignment. The authors use patching techniques to demonstrate that simple encoding-stage interventions can improve alignment quality.
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
This paper demonstrates that sparse autoencoders can extract interpretable features from Claude 3 Sonnet, a production-scale language model, addressing scalability concerns for dictionary learning. The features are multilingual, multimodal, and include safety-relevant concepts like deception and sycophancy, with causal influence on model outputs.
Psychological Constructs in Shared Semantic Space
This paper proposes a framework using Supervised Semantic Differential to represent psychological constructs as directions in a shared word-embedding space, enabling comparison across different measurement instruments and research traditions.