Finite Certificates for In-Context Determinacy and a Threshold Theory of Emergence in Language Models

arXiv cs.LG Papers

Summary

This paper introduces finite certificates for verifying determinacy and emergence in language model in-context behavior, providing theoretical criteria and experimental validation on contemporary models.

arXiv:2606.07623v1 Announce Type: new Abstract: This paper develops a model-theoretic framework for verifying context-conditioned language-model behavior by replacing benchmark labels with finite semantic certificates. The first problem is finite determinacy: when do examples in a context force the answer to a query without changing model parameters? In finite-field linear task families, we prove an exact row-space criterion, compute the residual hypothesis count, derive full and query-local identification curves, and show that extracting a smallest forcing subcontext is NP-complete even for binary outputs. The second problem is threshold emergence: when does an apparent benchmark jump reflect a semantic transition rather than a discontinuity of the scoring map? We prove an anti-mirage theorem separating thresholded metrics from semantic confidence and give a rate-sensitive crossing bound for latent commitments becoming visible above threshold. The common semantic object is a confidence functional on definable events. We show that it is a Boolean probability measure, equivalently a Keisler measure on the relevant type space, whose measure-one formulas form a proper filter and whose Stone-space representation is invariant under definitional expansion. The resulting calculus provides finite context certificates, pair-separator hitting sets, query teaching dimension, prompt-preservation criteria, and scale-limit witnesses. Exact-arithmetic ancillary scripts reproduce the finite-field and threshold calculations and generate the data used by the figures.
Original Article
View Cached Full Text

Cached at: 06/09/26, 08:51 AM

# Finite Certificates for In-Context Determinacy and a Threshold Theory of Emergence in Language Models
Source: [https://arxiv.org/html/2606.07623](https://arxiv.org/html/2606.07623)
Faruk Alpay Department of Computer Engineering Bahcesehir University, Istanbul, Turkey faruk\.alpay@bahcesehir\.edu\.trHamdi Alakkad Department of Artificial Intelligence Engineering Bahcesehir University, Istanbul, Turkey hamdi\.alakkad@bahcesehir\.edu\.tr

###### Abstract

We study two verification problems for context\-conditioned language\-model behavior by replacing benchmark labels with finite semantic certificates\. The first problem is finite determinacy: when do examples in a context force the answer to a query without changing model parameters? For finite\-field linear task families we prove an exact row\-space criterion, compute the residual hypothesis count, give the full\-identification curveId​\(n\)=∏i=0d−1\(1−Qi−n\)I\_\{d\}\(n\)=\\prod\_\{i=0\}^\{d\-1\}\(1\-Q^\{i\-n\}\), and derive a query\-local determination curve for a fixed nonzero query\. We also show that extracting a smallest forcing subcontext is NP\-complete, even for binary outputs\. The second problem is threshold emergence: when does an apparent benchmark jump indicate a semantic transition rather than a discontinuity of the scoring map? We prove an anti\-mirage theorem separating thresholded metrics from semantic confidence, and a rate\-sensitive crossing boundλτ=\(a/\(1−τ\)\)1/α\\lambda\_\{\\tau\}=\(a/\(1\-\\tau\)\)^\{1/\\alpha\}for latent commitments becoming visible above threshold\. The common semantic object is the confidence functionalsλ,c​\(φ\)=μλ,c​\(\[\[φ\]\]\)s\_\{\\lambda,c\}\(\\varphi\)=\\mu\_\{\\lambda,c\}\(\[\\\!\[\\varphi\]\\\!\]\)on definable events\. We show that it is a Boolean probability measure, equivalently a Keisler measure on the relevant type space, whose measure\-one formulas form a proper filter and whose Stone\-space representation is invariant under definitional expansion\. The resulting calculus gives reusable objects: finite context certificates, pair\-separator hitting sets, query teaching dimension, prompt\-preservation criteria, and scale\-limit witnesses\. An ancillary artifact reproduces the finite\-field and threshold calculations by exact\-arithmetic scripts and records the emitted data files used by the figures\. To probe whether the certificates these theorems isolate are within reach of trained systems, we also run a deterministic certificate\-emission benchmark over a weak\-to\-mid panel of contemporary language models, scored by exact match and by a graded proxy\. The exact score reproduces the predicted threshold jump—it stays near zero on a multi\-field certificate until a single system crosses, while the graded confidence rises smoothly—an instance of the anti\-mirage theorem on trained systems, sharpened by a conjunction count that inflates the crossing scale byk1/αk^\{1/\\alpha\}\. Because each family criterion is a sound, oracle\-free checker, the same machinery defines an aversive closed loop whose accepted set is sound and non\-decreasing; its reach is governed by whether a generator can act on the checker’s directional feedback, so it provably erases a metric\-artifact threshold while leaving a genuine arithmetic gap intact, with no reference answer entering the loop\.

## 1Introduction

This paper treats context\-conditioned language\-model behavior as a finite certification problem\. The first target is*determinacy*: given examples in the context and a query, when is the answer forced by the semantic constraints rather than merely favored by the decoder? In finite\-field linear task families the answer is exact\. A query is forced precisely when it lies in the row space of the example design matrix; the number of remaining latent parameters isQd−rQ^\{d\-r\}when the observed matrix has rankrr; full identification afternnrandom examples has probability

Id​\(n\)=∏i=0d−1\(1−Qi−n\),I\_\{d\}\(n\)=\\prod\_\{i=0\}^\{d\-1\}\\bigl\(1\-Q^\{i\-n\}\\bigr\),and a fixed nonzero query has its own determination curve obtained by averaging over the rank distribution\. Extracting the smallest forcing subcontext is NP\-complete\. The second target is*threshold emergence*: a benchmark can jump because the metric thresholds a smooth semantic confidence curve\. We prove an anti\-mirage theorem for this separation \(Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8)\) and a rate\-sensitive crossing\-scale boundλτ=\(a/\(1−τ\)\)1/α\\lambda\_\{\\tau\}=\(a/\(1\-\\tau\)\)^\{1/\\alpha\}for latent commitments becoming visible above threshold \(Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5)\)\. These statements do not assert that every trained model is exactly linear or that every benchmark jump is artificial\. They provide a certification language in which those claims can be tested\. Section[8](https://arxiv.org/html/2606.07623#S8)reproduces the quantitative theorems by exact\-arithmetic scripts, and Section[10](https://arxiv.org/html/2606.07623#S10)states controlled falsification protocols for trained models\.

Large language models are evaluated through an increasingly stable empirical pattern: a fixed trained system receives a finite context, the context changes the distribution over outputs, and larger systems make some context\-conditioned behaviors more reliable\. GPT\-3 made this pattern operational by reporting zero\-shot, one\-shot, and few\-shot performance across translation, question answering, cloze completion, arithmetic, and SuperGLUE\-style evaluations, all without parameter updates at test time\[[5](https://arxiv.org/html/2606.07623#bib.bib5)\]\. Scaling\-law work then showed that cross\-entropy loss can vary smoothly with scale over broad regimes, which separates the growth of capability from any single benchmark threshold\[[14](https://arxiv.org/html/2606.07623#bib.bib14)\]\. Compute\-optimal scaling refined the picture by showing that data allocation and model size jointly determine the observed loss curve\[[13](https://arxiv.org/html/2606.07623#bib.bib13)\]\. PaLM continued the same trend across large\-scale language, reasoning, and multilingual benchmarks\[[6](https://arxiv.org/html/2606.07623#bib.bib6)\]\.

The later GPT line made the semantic ambiguity harder to ignore rather than easier\. GPT\-4 reported human\-level performance on several professional and academic benchmarks, including a simulated bar exam result around the top decile of test takers\[[20](https://arxiv.org/html/2606.07623#bib.bib20)\]\. GPT\-4\.1 reported large improvements on coding and instruction\-following evaluations, including 54\.6% on SWE\-bench Verified and 38\.3% on MultiChallenge\[[21](https://arxiv.org/html/2606.07623#bib.bib21)\]\. GPT\-5 was presented as a system combining a fast model, a deeper reasoning model, and a router, with reported developer\-facing scores such as 74\.9% on SWE\-bench Verified and 88% on Aider Polyglot\[[22](https://arxiv.org/html/2606.07623#bib.bib22)\]\. GPT\-5\.5 further emphasized complex professional work, reporting 84\.9% on GDPval, 78\.7% on OSWorld\-Verified, and 98\.0% on Tau2\-bench Telecom without prompt tuning\[[23](https://arxiv.org/html/2606.07623#bib.bib23)\]\. These numbers do not by themselves say what kind of semantic object a prompt denotes, what it means for examples in the context to determine an answer, or when a visible benchmark jump corresponds to a genuine semantic transition\.

This is the problem addressed here\. Empirical evaluation supplies distributions, accuracies, and thresholded scores\. Verification needs a different object: a logic in which a behavior can be stated as a formula, a context can be interpreted as an update, and a scale trend can be tested against a limiting semantic commitment\. Without that separation, three distinct claims are easily conflated\. First, a model may assign high probability to a correct output without semantically entailing it\. Second, a prompt may override a previous default without adding an ordinary monotone axiom\. Third, a benchmark score may jump because the observation map is discontinuous even though the underlying confidence changes gradually\.

Existing theories of in\-context learning explain important parts of this picture but do not define a common semantic target\. One line treats in\-context behavior as latent Bayesian inference over tasks\[[29](https://arxiv.org/html/2606.07623#bib.bib29)\]\. Another asks whether transformer computations implement standard learning algorithms during the forward pass\[[4](https://arxiv.org/html/2606.07623#bib.bib4)\]\. Gradient\-descent interpretations give an operational account of a related phenomenon\[[27](https://arxiv.org/html/2606.07623#bib.bib27)\]\. Work on simple function classes identifies sample sequences that can be learned in context\[[8](https://arxiv.org/html/2606.07623#bib.bib8)\]\. Mechanistic studies explain circuit\-level regularities such as induction heads\[[19](https://arxiv.org/html/2606.07623#bib.bib19)\]\. These accounts are valuable, but they do not by themselves produce a first\-order object whose consequences can be studied by compactness, diagrams, filters, or ultraproducts\.

The corresponding semantic ingredients already exist in logic and formal pragmatics\. Montague semantics treats meaning with model\-theoretic precision\[[18](https://arxiv.org/html/2606.07623#bib.bib18)\]\. Stalnaker’s context sets make assertion a restriction of possibilities\[[26](https://arxiv.org/html/2606.07623#bib.bib26)\]\. Lewisian scorekeeping makes discourse state dynamic\[[16](https://arxiv.org/html/2606.07623#bib.bib16)\]\. Heim’s file\-change semantics gives a formal discipline for context growth\[[11](https://arxiv.org/html/2606.07623#bib.bib11)\]\. Gricean pragmatics explains why literal content and intended force diverge\[[10](https://arxiv.org/html/2606.07623#bib.bib10)\]\. Rational speech\-act models add a probabilistic account of pragmatic inference\[[7](https://arxiv.org/html/2606.07623#bib.bib7)\]\. Nonmonotonic logics and preferential models show how adding information can change what is selected as normal or preferred\[[15](https://arxiv.org/html/2606.07623#bib.bib15)\]\. Belief revision supplies another mathematical language for update under priority and inconsistency management\[[2](https://arxiv.org/html/2606.07623#bib.bib2)\]\.

The contribution of this paper is to turn the LLM\-specific problem into a model\-theoretic one\. For each scaleλ\\lambdawe use a tuple

𝔾λ=\(ℒ,𝒦λ,μλ,𝖢λ,𝖴λ,𝖣λ\),\\mathbb\{G\}\_\{\\lambda\}=\(\\mathcal\{L\},\\mathcal\{K\}\_\{\\lambda\},\\mu\_\{\\lambda\},\\mathsf\{C\}\_\{\\lambda\},\\mathsf\{U\}\_\{\\lambda\},\\mathsf\{D\}\_\{\\lambda\}\),where𝒦λ\\mathcal\{K\}\_\{\\lambda\}is a class of structures,μλ\\mu\_\{\\lambda\}is a probability measure on definable events,𝖴λ\\mathsf\{U\}\_\{\\lambda\}is a context update rule, and𝖣λ\\mathsf\{D\}\_\{\\lambda\}is a decoding kernel\. This is not a claim that transformer activations literally contain first\-order models\. It is a semantic representation of behavioral commitments, just as automata, Kripke structures, and probabilistic transition systems represent behavior without copying physical implementation\.

#### Contributions\.

1. 1\.Identification curves for in\-context determinacy\.In a finite\-field linear task family, a query is forced exactly when it lies in the row space of the example design matrix; the residual hypothesis count is\|𝔽\|d−rank⁡\(An\)\|\\mathbb\{F\}\|^\{\\,d\-\\operatorname\{rank\}\(A\_\{n\}\)\}; full identification has curveId​\(n\)=∏i=0d−1\(1−Qi−n\)I\_\{d\}\(n\)=\\prod\_\{i=0\}^\{d\-1\}\(1\-Q^\{i\-n\}\); and fixed\-query determination is obtained by averaging\(Qr−1\)/\(Qd−1\)\(Q^\{r\}\-1\)/\(Q^\{d\}\-1\)over the rank distribution \(Theorems[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1),[5\.3](https://arxiv.org/html/2606.07623#S5.Thmtheorem3),[5\.4](https://arxiv.org/html/2606.07623#S5.Thmtheorem4),[5\.6](https://arxiv.org/html/2606.07623#S5.Thmtheorem6)\)\.
2. 2\.Certificate complexity\.Finite\-context determination reduces to a pair\-separator hitting problem \(Theorem[4\.11](https://arxiv.org/html/2606.07623#S4.Thmtheorem11)\), and extracting a minimal forcing subcontext is NP\-complete already for binary outputs \(Theorem[4\.13](https://arxiv.org/html/2606.07623#S4.Thmtheorem13)\); this connects context length to teaching dimension\[[30](https://arxiv.org/html/2606.07623#bib.bib30)\]and mistake\-bound dimension\[[32](https://arxiv.org/html/2606.07623#bib.bib32)\]\.
3. 3\.A threshold theory of emergence\.Thresholded benchmark jumps do not imply semantic discontinuity \(Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8)\); latent commitments becomeτ\\tau\-manifest with a crossing\-scale bound \(Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5)\); a continuous metric preserves graduality \(Proposition[6\.9](https://arxiv.org/html/2606.07623#S6.Thmtheorem9)\); and akk\-field exact\-match metric sharpens the apparent jump and inflates the crossing scale byk1/αk^\{1/\\alpha\}\(Proposition[6\.10](https://arxiv.org/html/2606.07623#S6.Thmtheorem10)\), an effect we then observe on trained systems \(Section[9](https://arxiv.org/html/2606.07623#S9)\)\. This reconciles the emergence\[[28](https://arxiv.org/html/2606.07623#bib.bib28)\]and mirage\[[25](https://arxiv.org/html/2606.07623#bib.bib25)\]accounts as two regimes of one inequality\.
4. 4\.Semantic confidence as a Keisler measure\.The confidence functionalsλ,cs\_\{\\lambda,c\}is a finitely additive probability on definable events, that is, a Keisler measure on the type space\[[31](https://arxiv.org/html/2606.07623#bib.bib31),[34](https://arxiv.org/html/2606.07623#bib.bib34)\]; its measure\-one formulas form a proper filter \(Theorem[2\.22](https://arxiv.org/html/2606.07623#S2.Thmtheorem22)\), it admits a Stone\-space representation \(Theorem[2\.25](https://arxiv.org/html/2606.07623#S2.Thmtheorem25)\), and it is invariant under definitional expansion \(Theorem[2\.30](https://arxiv.org/html/2606.07623#S2.Thmtheorem30)\)\.
5. 5\.Prompts as preferential updates\.Prompt consequence is nonmonotonic under extension \(Theorem[3\.4](https://arxiv.org/html/2606.07623#S3.Thmtheorem4)\); fixed\-preference fragments satisfy cautious monotony \(Theorem[3\.7](https://arxiv.org/html/2606.07623#S3.Thmtheorem7)\); prompt extension admits an exact preservation criterion \(Theorem[3\.9](https://arxiv.org/html/2606.07623#S3.Thmtheorem9)\), which explains order\- and extension\-sensitivity of prompting\[[33](https://arxiv.org/html/2606.07623#bib.bib33)\]\.
6. 6\.Verification, falsification, and a certificate benchmark\.Every quantitative theorem is confirmed by controlled exact\-arithmetic simulation \(Section[8](https://arxiv.org/html/2606.07623#S8)\); the three certificate families are then posed directly to a fixed panel of contemporary language models and graded against exact ground truth \(Section[9](https://arxiv.org/html/2606.07623#S9)\)\. Because each family criterion is a sound, oracle\-free checker, verification doubles as a closed loop: a provably\-sound, non\-decreasing aversive refinement schedule \(Proposition[7\.8](https://arxiv.org/html/2606.07623#S7.Thmtheorem8)\) whose reach is governed by a generator’s ability to act on the checker’s directional feedback, so it erases a metric\-artifact threshold but not a genuine arithmetic gap \(Proposition[7\.9](https://arxiv.org/html/2606.07623#S7.Thmtheorem9)\)\. Each prediction is also given a checkable protocol for trained models \(Section[10](https://arxiv.org/html/2606.07623#S10)\)\.

These objects are built to be reused, not only invoked\. Semantic confidence, the identification curveId​\(n\)I\_\{d\}\(n\), the pair\-separator certificate, and the crossing\-scale boundλτ\\lambda\_\{\\tau\}are each stated so that they can be measured, bounded, and tested against trained models, rather than serving as a purely conceptual distinction\.

The technical target is therefore not a taxonomy of LLM behavior\. It is a collection of decision and invariance problems\. When does a finite context determine a unique answer? How many examples are necessary before a query becomes determined? Which prompt extensions preserve all old consequences? Which conclusions survive a harmless change of semantic presentation? When does a benchmark threshold reveal a limit\-theoretic fact, and when is it only an artifact of observation? The body of the paper answers these questions in the general compactness setting where possible, and in finite linear task families where exact numerical criteria can be proved\.

## 2Semantic presentations of pre\-trained LLMs

We work in many\-sorted first\-order logic\. Standard references for model theory include Hodges and Marker\[[12](https://arxiv.org/html/2606.07623#bib.bib12),[17](https://arxiv.org/html/2606.07623#bib.bib17)\]\. The signatureℒ\\mathcal\{L\}contains at least a sort𝐂\\mathbf\{C\}for context codes and a sort𝐘\\mathbf\{Y\}for outputs\. Additional sorts may represent latent tasks, discourse referents, world states, proof objects, tool states, or internal semantic hypotheses\.

The semantic layer is not intended to be a literal description of transformer activations\. It is an external representation of behavior\. In the same way that a probabilistic automaton may summarize a physical device without reproducing its circuitry, a semantic presentation summarizes the input\-output commitments of a trained model by a probability\-bearing class of structures\.

### 2\.1The separation problem

The central verification failure mode is a category error between three levels of description\. Letqqbe a task instance, letA​\(q,y\)A\(q,y\)say thatyyis an admissible answer toqq, and letR​\(y\)R\(y\)be the benchmark scoring predicate\. A model can satisfy

Pλ,c​\(R\)\>τP\_\{\\lambda,c\}\(R\)\>\\tauwhile failing to satisfy any logical uniqueness statement aboutA​\(q,y\)A\(q,y\)\. Conversely, the semantic state may entail a unique admissible answer while the decoder assigns nonzero probability to malformed strings, refusals, or tool calls\. A third possibility is thatsλ,c​\(ψ\)s\_\{\\lambda,c\}\(\\psi\)changes gradually while the reported scoreΩ​\(sλ,c​\(ψ\)\)\\Omega\(s\_\{\\lambda,c\}\(\\psi\)\)jumps becauseΩ\\Omegais discontinuous\.

This paper solves the separation problem by assigning a different mathematical object to each level:

context update𝖴λ,\\displaystyle\\mathsf\{U\}\_\{\\lambda\},semantic truth𝔐⊧φ,\\displaystyle\\mathfrak\{M\}\\models\\varphi,confidencesλ,c​\(φ\),\\displaystyle s\_\{\\lambda,c\}\(\\varphi\),generation𝖣λ\.\\displaystyle\\mathsf\{D\}\_\{\\lambda\}\.The four objects interact, but none reduces to the others\. The subsequent definitions are designed to make that non\-reduction provable\.

###### Definition 2\.1\(Semantic answerability\)\.

LetA​\(x,y\)A\(x,y\)be a formula withxxof query sort andyyof output sort\. A query termqqis*semantically answerable*at\(λ,c\)\(\\lambda,c\)if

∃y​A​\(q,y\)∈Thλ⁡\(c\)\.\\exists y\\,A\(q,y\)\\in\\operatorname\{Th\}\_\{\\lambda\}\(c\)\.It is*uniquely answerable*if

∃y​A​\(q,y\)∈Thλ⁡\(c\)and∀y​∀z​\(A​\(q,y\)∧A​\(q,z\)→y=z\)∈Thλ⁡\(c\)\.\\exists y\\,A\(q,y\)\\in\\operatorname\{Th\}\_\{\\lambda\}\(c\)\\qquad\\text\{and\}\\qquad\\forall y\\forall z\\bigl\(A\(q,y\)\\wedge A\(q,z\)\\rightarrow y=z\\bigr\)\\in\\operatorname\{Th\}\_\{\\lambda\}\(c\)\.

###### Proposition 2\.2\(Answerability and observed success are independent\)\.

Semantic answerability, unique semantic answerability, and high observed benchmark probability are pairwise non\-equivalent in general\.

###### Proof\.

For answerability without unique answerability, take a model class in which two distinct outputs satisfyA​\(q,y\)A\(q,y\)on a measure\-one set\. Then∃y​A​\(q,y\)\\exists y\\,A\(q,y\)is almost sure, but uniqueness is false almost surely\. For unique answerability without high observed success, take a model class where a uniqueAA\-answer is entailed and choose a decoding kernel assigning most probability mass to outputs outside the benchmark scoring predicateRR\. For high observed success without semantic answerability, take a model class whereA​\(q,y\)A\(q,y\)varies across structures with no measure\-one witness, but choose a decoder that assigns high probability to some output satisfyingRR\. These constructions satisfy the definitions and show that no implication holds without additional faithfulness assumptions\. ∎

### 2\.2Theoretical targets

The paper is organized around four mathematical problems\. They are stated here explicitly because they are the points at which empirical language\-model evaluation usually stops being logically well\-formed\.

###### Problem 2\.3\(Finite determinacy\)\.

Given a background theoryT0T\_\{0\}, an example streamEE, a queryqq, and an output propertyχ​\(y\)\\chi\(y\), characterize when

Tω⊧χ​\(f​\(q\)\)T\_\{\\omega\}\\models\\chi\(f\(q\)\)can be witnessed by some finite prefixTNT\_\{N\}\.

The compactness results in Section[4](https://arxiv.org/html/2606.07623#S4)solve the qualitative version of this problem for first\-order properties\. The linear task results in Section[5](https://arxiv.org/html/2606.07623#S5)give exact rank and row\-space certificates for a concrete family\.

###### Problem 2\.4\(Certificate complexity\)\.

For a class of tasks and a fragmentℱ\\mathcal\{F\}, bound the leastNNfor whichTNT\_\{N\}entails the target formula, or prove that no shorter prefix can do so\.

This problem is not addressed by ordinary benchmark accuracy\. It asks for the size of a logical certificate\. Proposition[4\.9](https://arxiv.org/html/2606.07623#S4.Thmtheorem9)gives the general lower\-bound principle, while Theorem[5\.3](https://arxiv.org/html/2606.07623#S5.Thmtheorem3)computes the obstruction in a finite linear family\.

###### Problem 2\.5\(Prompt\-stability\)\.

Given promptsppandqq, decide whether every consequence ofppremains a consequence after appendingqq\.

Theorem[3\.9](https://arxiv.org/html/2606.07623#S3.Thmtheorem9)gives an exact semantic criterion\. It explains why prompt extension is not merely adding axioms: the hard content may increase while the selected preferred models change\.

###### Problem 2\.6\(Minimal context extraction\)\.

Given a finite deterministic task family, a current example set, and a queryqq, compute the smallest subcontext whose labels already force the answer toqq\.

Theorem[4\.11](https://arxiv.org/html/2606.07623#S4.Thmtheorem11)reduces this problem to a pair\-separator hitting problem\. Theorem[4\.13](https://arxiv.org/html/2606.07623#S4.Thmtheorem13)shows that the resulting minimization problem is computationally hard even for binary outputs\. This is the point at which finite\-context semantics becomes an algorithmic object rather than a descriptive metaphor\.

###### Problem 2\.7\(Semantic versus observed emergence\)\.

Given confidence valuessλ​\(φ\)s\_\{\\lambda\}\(\\varphi\)and a benchmark observation map, determine whether an apparent jump is forced by limit\-theoretic convergence or by the discontinuity of the observation map\.

Section[6](https://arxiv.org/html/2606.07623#S6)separates these cases\. It also gives a rate\-sensitive crossing bound, so the theory is not limited to qualitative convergence\.

### 2\.3Measurable definable events

Let𝒦⊆Mod⁡\(ℒ\)\\mathcal\{K\}\\subseteq\\operatorname\{Mod\}\(\\mathcal\{L\}\)be a class of structures\. For a sentenceφ∈Sent⁡\(ℒ\)\\varphi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\)write

\[\[φ\]\]𝒦:=\{𝔐∈𝒦:𝔐⊧φ\}\.\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}:=\\\{\\mathfrak\{M\}\\in\\mathcal\{K\}:\\mathfrak\{M\}\\models\\varphi\\\}\.These are the elementary events visible to the semantic language\.

###### Definition 2\.8\(Definable event algebra\)\.

The Boolean algebra of sentence\-definable events over𝒦\\mathcal\{K\}is

𝖣𝖾𝖿ℒ​\(𝒦\):=\{\[\[φ\]\]𝒦:φ∈Sent⁡\(ℒ\)\}\.\\mathsf\{Def\}\_\{\\mathcal\{L\}\}\(\\mathcal\{K\}\):=\\\{\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}:\\varphi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\)\\\}\.When a probability measureμ\\muis defined on aσ\\sigma\-algebra𝒜\\mathcal\{A\}containing𝖣𝖾𝖿ℒ​\(𝒦\)\\mathsf\{Def\}\_\{\\mathcal\{L\}\}\(\\mathcal\{K\}\), the pair\(𝒦,μ\)\(\\mathcal\{K\},\\mu\)is called a*measured model class*\.

The Boolean operations are inherited from logic:

\[\[¬φ\]\]𝒦=𝒦∖\[\[φ\]\]𝒦,\[\[φ∧ψ\]\]𝒦=\[\[φ\]\]𝒦∩\[\[ψ\]\]𝒦\.\[\\\!\[\\neg\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}=\\mathcal\{K\}\\setminus\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\},\\qquad\[\\\!\[\\varphi\\wedge\\psi\]\\\!\]\_\{\\mathcal\{K\}\}=\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\\cap\[\\\!\[\\psi\]\\\!\]\_\{\\mathcal\{K\}\}\.Thus probability on definable events is exactly probability on truth values\.

###### Lemma 2\.9\(Elementary event algebra\)\.

For every𝒦⊆Mod⁡\(ℒ\)\\mathcal\{K\}\\subseteq\\operatorname\{Mod\}\(\\mathcal\{L\}\),𝖣𝖾𝖿ℒ​\(𝒦\)\\mathsf\{Def\}\_\{\\mathcal\{L\}\}\(\\mathcal\{K\}\)is a Boolean subalgebra of𝒫​\(𝒦\)\\mathcal\{P\}\(\\mathcal\{K\}\)\. The map

φ⟼\[\[φ\]\]𝒦\\varphi\\longmapsto\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}factors through logical equivalence over𝒦\\mathcal\{K\}\.

###### Proof\.

Closure under complement and finite intersection follows from negation and conjunction\. Closure under finite union follows from disjunction\. Ifφ\\varphiandψ\\psiare equivalent over𝒦\\mathcal\{K\}, then for every𝔐∈𝒦\\mathfrak\{M\}\\in\\mathcal\{K\}one has𝔐⊧φ\\mathfrak\{M\}\\models\\varphiexactly when𝔐⊧ψ\\mathfrak\{M\}\\models\\psi, so the corresponding events are equal\. ∎

###### Definition 2\.10\(Semantic presentation\)\.

Fix a scale parameterλ∈Λ\\lambda\\in\\Lambda, whereΛ\\Lambdais a directed set\. A*semantic presentation*at scaleλ\\lambdais a tuple

𝔾λ=\(ℒ,𝒦λ,μλ,𝖢λ,𝖴λ,𝖣λ\)\\mathbb\{G\}\_\{\\lambda\}=\(\\mathcal\{L\},\\mathcal\{K\}\_\{\\lambda\},\\mu\_\{\\lambda\},\\mathsf\{C\}\_\{\\lambda\},\\mathsf\{U\}\_\{\\lambda\},\\mathsf\{D\}\_\{\\lambda\}\)such that:

1. \(a\)ℒ\\mathcal\{L\}is a countable many\-sorted first\-order signature;
2. \(b\)𝒦λ⊆Mod⁡\(ℒ\)\\mathcal\{K\}\_\{\\lambda\}\\subseteq\\operatorname\{Mod\}\(\\mathcal\{L\}\)is a nonempty class of structures;
3. \(c\)μλ\\mu\_\{\\lambda\}is a probability measure on aσ\\sigma\-algebra containing𝖣𝖾𝖿ℒ​\(𝒦λ\)\\mathsf\{Def\}\_\{\\mathcal\{L\}\}\(\\mathcal\{K\}\_\{\\lambda\}\);
4. \(d\)𝖢λ\\mathsf\{C\}\_\{\\lambda\}is a nonempty set of context codes;
5. \(e\)𝖴λ\\mathsf\{U\}\_\{\\lambda\}mapsc∈𝖢λc\\in\\mathsf\{C\}\_\{\\lambda\}to a posterior measure μλ,c:=𝖴λ​\(c\)​\(μλ\)\\mu\_\{\\lambda,c\}:=\\mathsf\{U\}\_\{\\lambda\}\(c\)\(\\mu\_\{\\lambda\}\)on the same measurable model class;
6. \(f\)𝖣λ\\mathsf\{D\}\_\{\\lambda\}is a decoding kernel such thaty↦𝖣λ​\(y∣𝔐,c\)y\\mapsto\\mathsf\{D\}\_\{\\lambda\}\(y\\mid\\mathfrak\{M\},c\)is a probability distribution on outputs for every𝔐∈𝒦λ\\mathfrak\{M\}\\in\\mathcal\{K\}\_\{\\lambda\}andc∈𝖢λc\\in\\mathsf\{C\}\_\{\\lambda\}\.

The induced observable distribution is

ℙλ​\(y∣c\)=∫𝒦λ𝖣λ​\(y∣𝔐,c\)​𝑑μλ,c​\(𝔐\)\.\\mathbb\{P\}\_\{\\lambda\}\(y\\mid c\)=\\int\_\{\\mathcal\{K\}\_\{\\lambda\}\}\\mathsf\{D\}\_\{\\lambda\}\(y\\mid\\mathfrak\{M\},c\)\\,d\\mu\_\{\\lambda,c\}\(\\mathfrak\{M\}\)\.

The definition separates three levels: model uncertainty, context update, and output decoding\. A prompt changes the posterior semantic measure\. Decoding then converts that posterior into tokens, strings, or structured outputs\.

###### Definition 2\.11\(Semantic confidence and contextual theory\)\.

For a sentenceφ∈Sent⁡\(ℒ\)\\varphi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\)define

sλ,c​\(φ\):=μλ,c​\(\[\[φ\]\]𝒦λ\)\.s\_\{\\lambda,c\}\(\\varphi\):=\\mu\_\{\\lambda,c\}\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\_\{\\lambda\}\}\)\.The almost\-sure contextual theory is

Thλ⁡\(c\):=\{φ∈Sent⁡\(ℒ\):sλ,c​\(φ\)=1\}\.\\operatorname\{Th\}\_\{\\lambda\}\(c\):=\\\{\\varphi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\):s\_\{\\lambda,c\}\(\\varphi\)=1\\\}\.For a fragmentℱ⊆Sent⁡\(ℒ\)\\mathcal\{F\}\\subseteq\\operatorname\{Sent\}\(\\mathcal\{L\}\), write

Thλℱ⁡\(c\):=Thλ⁡\(c\)∩ℱ\.\\operatorname\{Th\}\_\{\\lambda\}^\{\\mathcal\{F\}\}\(c\):=\\operatorname\{Th\}\_\{\\lambda\}\(c\)\\cap\\mathcal\{F\}\.

The measure\-one threshold is strict\. It records semantic commitments that hold outside a null set of admissible structures\. This is stronger than high probability and weaker than truth in every structure of𝒦λ\\mathcal\{K\}\_\{\\lambda\}\.

###### Proposition 2\.12\(Closure of contextual theories\)\.

For everyλ\\lambdaandcc,Thλ⁡\(c\)\\operatorname\{Th\}\_\{\\lambda\}\(c\)is closed under first\-order consequence\. More precisely, ifΓ⊆Thλ⁡\(c\)\\Gamma\\subseteq\\operatorname\{Th\}\_\{\\lambda\}\(c\)is finite andΓ⊧ψ\\Gamma\\models\\psi, thenψ∈Thλ⁡\(c\)\\psi\\in\\operatorname\{Th\}\_\{\\lambda\}\(c\)\.

###### Proof\.

LetΓ=\{γ1,…,γm\}⊆Thλ⁡\(c\)\\Gamma=\\\{\\gamma\_\{1\},\\ldots,\\gamma\_\{m\}\\\}\\subseteq\\operatorname\{Th\}\_\{\\lambda\}\(c\)\. Each event\[\[γi\]\]\[\\\!\[\\gamma\_\{i\}\]\\\!\]has measure11\. Therefore

μλ,c​\(⋂i=1m\[\[γi\]\]\)=1\.\\mu\_\{\\lambda,c\}\\left\(\\bigcap\_\{i=1\}^\{m\}\[\\\!\[\\gamma\_\{i\}\]\\\!\]\\right\)=1\.IfΓ⊧ψ\\Gamma\\models\\psi, then every structure satisfying allγi\\gamma\_\{i\}also satisfiesψ\\psi, so

⋂i=1m\[\[γi\]\]⊆\[\[ψ\]\]\.\\bigcap\_\{i=1\}^\{m\}\[\\\!\[\\gamma\_\{i\}\]\\\!\]\\subseteq\[\\\!\[\\psi\]\\\!\]\.Thus\[\[ψ\]\]\[\\\!\[\\psi\]\\\!\]has measure11, andψ∈Thλ⁡\(c\)\\psi\\in\\operatorname\{Th\}\_\{\\lambda\}\(c\)\. ∎

###### Proposition 2\.13\(Consistency on realized support\)\.

Assumeμλ,c\\mu\_\{\\lambda,c\}is countably additive\. Every finite subset ofThλ⁡\(c\)\\operatorname\{Th\}\_\{\\lambda\}\(c\)is satisfiable in𝒦λ\\mathcal\{K\}\_\{\\lambda\}\. HenceThλ⁡\(c\)\\operatorname\{Th\}\_\{\\lambda\}\(c\)is syntactically consistent\.

###### Proof\.

IfΔ=\{δ1,…,δm\}⊆Thλ⁡\(c\)\\Delta=\\\{\\delta\_\{1\},\\ldots,\\delta\_\{m\}\\\}\\subseteq\\operatorname\{Th\}\_\{\\lambda\}\(c\), then the intersection

AΔ:=⋂i=1m\[\[δi\]\]A\_\{\\Delta\}:=\\bigcap\_\{i=1\}^\{m\}\[\\\!\[\\delta\_\{i\}\]\\\!\]has measure11\. In particular, it is nonempty\. Any𝔐∈AΔ\\mathfrak\{M\}\\in A\_\{\\Delta\}satisfiesΔ\\Delta\. IfThλ⁡\(c\)\\operatorname\{Th\}\_\{\\lambda\}\(c\)were syntactically inconsistent, some finite subset would be inconsistent by compactness of proof systems, contradicting finite satisfiability\. ∎

###### Definition 2\.14\(Decisive fragment at a context\)\.

A fragmentℱ⊆Sent⁡\(ℒ\)\\mathcal\{F\}\\subseteq\\operatorname\{Sent\}\(\\mathcal\{L\}\)is*decisive*at\(λ,c\)\(\\lambda,c\)if for everyφ∈ℱ\\varphi\\in\\mathcal\{F\},

sλ,c​\(φ\)∈\{0,1\}\.s\_\{\\lambda,c\}\(\\varphi\)\\in\\\{0,1\\\}\.

###### Corollary 2\.15\(Completeness on decisive fragments\)\.

Ifℱ\\mathcal\{F\}is closed under negation and decisive at\(λ,c\)\(\\lambda,c\), then for everyφ∈ℱ\\varphi\\in\\mathcal\{F\}exactly one ofφ\\varphiand¬φ\\neg\\varphibelongs toThλℱ⁡\(c\)\\operatorname\{Th\}\_\{\\lambda\}^\{\\mathcal\{F\}\}\(c\)\.

###### Proof\.

For every sentenceφ\\varphi,

sλ,c​\(¬φ\)=1−sλ,c​\(φ\)\.s\_\{\\lambda,c\}\(\\neg\\varphi\)=1\-s\_\{\\lambda,c\}\(\\varphi\)\.Ifsλ,c​\(φ\)s\_\{\\lambda,c\}\(\\varphi\)is either0or11, exactly one ofsλ,c​\(φ\)s\_\{\\lambda,c\}\(\\varphi\)andsλ,c​\(¬φ\)s\_\{\\lambda,c\}\(\\neg\\varphi\)is equal to11\. ∎

### 2\.4Output properties

A generated output may be a token, string, proof sketch, tool call, or structured object\. The semantic presentation handles this by putting outputs in a sort𝐘\\mathbf\{Y\}and using a decoding kernel\.

###### Definition 2\.16\(Output property probability\)\.

Letχ​\(y\)\\chi\(y\)be a formula with one free variable of output sort\. Define the semantic\-output probability

Pλ,c​\(χ\):=∫𝒦λ∑y:𝔐⊧χ​\(y\)𝖣λ​\(y∣𝔐,c\)​d​μλ,c​\(𝔐\),P\_\{\\lambda,c\}\(\\chi\):=\\int\_\{\\mathcal\{K\}\_\{\\lambda\}\}\\sum\_\{y:\\,\\mathfrak\{M\}\\models\\chi\(y\)\}\\mathsf\{D\}\_\{\\lambda\}\(y\\mid\\mathfrak\{M\},c\)\\,d\\mu\_\{\\lambda,c\}\(\\mathfrak\{M\}\),whenever the sum is measurable\. If𝐘\\mathbf\{Y\}is finite, this is automatic\.

The valuesλ,c​\(∃y​χ​\(y\)\)s\_\{\\lambda,c\}\(\\exists y\\,\\chi\(y\)\)measures whether the semantic state admits an output satisfyingχ\\chi\. The valuePλ,c​\(χ\)P\_\{\\lambda,c\}\(\\chi\)measures whether the decoder actually emits such an output\. The two are different\. This separation is important for verification, because a property may be semantically entailed while a stochastic decoder still has residual failure probability\.

###### Definition 2\.17\(Faithful decoding on a property\)\.

A decoding kernel is*ϵ\\epsilon\-faithful*to a sentence∀y​\(α​\(y\)→χ​\(y\)\)\\forall y\(\\alpha\(y\)\\rightarrow\\chi\(y\)\)at\(λ,c\)\(\\lambda,c\)if

∫\[\[∀y​\(α​\(y\)→χ​\(y\)\)\]\]∑y:𝔐⊧α​\(y\)∧¬χ​\(y\)𝖣λ​\(y∣𝔐,c\)​d​μλ,c​\(𝔐\)≤ϵ\.\\int\_\{\[\\\!\[\\forall y\(\\alpha\(y\)\\rightarrow\\chi\(y\)\)\]\\\!\]\}\\sum\_\{y:\\,\\mathfrak\{M\}\\models\\alpha\(y\)\\wedge\\neg\\chi\(y\)\}\\mathsf\{D\}\_\{\\lambda\}\(y\\mid\\mathfrak\{M\},c\)\\,d\\mu\_\{\\lambda,c\}\(\\mathfrak\{M\}\)\\leq\\epsilon\.

###### Proposition 2\.18\(Semantic entailment plus faithful decoding\)\.

If∀y​\(α​\(y\)→χ​\(y\)\)∈Thλ⁡\(c\)\\forall y\(\\alpha\(y\)\\rightarrow\\chi\(y\)\)\\in\\operatorname\{Th\}\_\{\\lambda\}\(c\)and the decoder isϵ\\epsilon\-faithful to that sentence at\(λ,c\)\(\\lambda,c\), then the probability of emitting anα\\alpha\-output violatingχ\\chiis at mostϵ\\epsilon\.

###### Proof\.

The violation probability is the integral of the decoder mass assigned to outputs satisfyingα​\(y\)∧¬χ​\(y\)\\alpha\(y\)\\wedge\\neg\\chi\(y\)\. The set of structures satisfying∀y​\(α​\(y\)→χ​\(y\)\)\\forall y\(\\alpha\(y\)\\rightarrow\\chi\(y\)\)has measure11\. The integral over its complement is therefore0, and the integral over the measure\-one set is bounded byϵ\\epsilonby definition\. ∎

This proposition clarifies the verification target\. Logic controls which structures are admissible\. The decoder controls how reliably admissible semantic commitments are turned into observable strings\.

### 2\.5Boolean\-algebraic semantics

The same construction can be expressed through the Lindenbaum algebra of formulas modulo equivalence on the model class\. This gives a compact way to see why almost\-sure theories behave like filters and why decisive fragments behave like complete theories\.

###### Definition 2\.19\(Lindenbaum algebra over a model class\)\.

Forφ,ψ∈Sent⁡\(ℒ\)\\varphi,\\psi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\)write

φ≡𝒦ψ⟺\[\[φ\]\]𝒦=\[\[ψ\]\]𝒦\.\\varphi\\equiv\_\{\\mathcal\{K\}\}\\psi\\quad\\Longleftrightarrow\\quad\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}=\[\\\!\[\\psi\]\\\!\]\_\{\\mathcal\{K\}\}\.Let

𝔹𝒦:=Sent\(ℒ\)/≡𝒦\\mathbb\{B\}\_\{\\mathcal\{K\}\}:=\\operatorname\{Sent\}\(\\mathcal\{L\}\)/\\equiv\_\{\\mathcal\{K\}\}with Boolean operations induced by¬,∧,∨\\neg,\\wedge,\\vee\. For a measured model class\(𝒦,μ\)\(\\mathcal\{K\},\\mu\)define

μ¯​\(\[φ\]\):=μ​\(\[\[φ\]\]𝒦\)\.\\bar\{\\mu\}\(\[\\varphi\]\):=\\mu\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)\.

###### Proposition 2\.20\(Semantic confidence as a Boolean probability measure\)\.

The mapμ¯:𝔹𝒦→\[0,1\]\\bar\{\\mu\}:\\mathbb\{B\}\_\{\\mathcal\{K\}\}\\to\[0,1\]is a finitely additive probability measure on the Boolean algebra𝔹𝒦\\mathbb\{B\}\_\{\\mathcal\{K\}\}\. That is,

μ¯​\(⊤\)=1,μ¯​\(⊥\)=0,\\bar\{\\mu\}\(\\top\)=1,\\qquad\\bar\{\\mu\}\(\\bot\)=0,and ifa∧b=⊥a\\wedge b=\\bot, then

μ¯​\(a∨b\)=μ¯​\(a\)\+μ¯​\(b\)\.\\bar\{\\mu\}\(a\\vee b\)=\\bar\{\\mu\}\(a\)\+\\bar\{\\mu\}\(b\)\.

###### Proof\.

The value is well\-defined because equivalent sentences have the same definable event\. The identities for⊤\\topand⊥\\botfollow from\[\[⊤\]\]𝒦=𝒦\[\\\!\[\\top\]\\\!\]\_\{\\mathcal\{K\}\}=\\mathcal\{K\}and\[\[⊥\]\]𝒦=∅\[\\\!\[\\bot\]\\\!\]\_\{\\mathcal\{K\}\}=\\varnothing\. Ifa=\[φ\]a=\[\\varphi\]andb=\[ψ\]b=\[\\psi\]are disjoint in the Boolean algebra, then\[\[φ\]\]𝒦\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}and\[\[ψ\]\]𝒦\[\\\!\[\\psi\]\\\!\]\_\{\\mathcal\{K\}\}are disjoint measurable sets\. Finite additivity ofμ\\mugives the formula\. ∎

###### Definition 2\.21\(Almost\-sure filter\)\.

For a measured model class\(𝒦,μ\)\(\\mathcal\{K\},\\mu\)define

ℱμ:=\{\[φ\]∈𝔹𝒦:μ¯​\(\[φ\]\)=1\}\.\\mathcal\{F\}\_\{\\mu\}:=\\\{\[\\varphi\]\\in\\mathbb\{B\}\_\{\\mathcal\{K\}\}:\\bar\{\\mu\}\(\[\\varphi\]\)=1\\\}\.

###### Theorem 2\.22\(Measure\-one formulas form a proper filter\)\.

If𝒦≠∅\\mathcal\{K\}\\neq\\varnothingandμ\\muis a probability measure, thenℱμ\\mathcal\{F\}\_\{\\mu\}is a proper filter in𝔹𝒦\\mathbb\{B\}\_\{\\mathcal\{K\}\}\. If, in addition,μ¯​\(a\)∈\{0,1\}\\bar\{\\mu\}\(a\)\\in\\\{0,1\\\}for everya∈𝔹𝒦a\\in\\mathbb\{B\}\_\{\\mathcal\{K\}\}, thenℱμ\\mathcal\{F\}\_\{\\mu\}is an ultrafilter\.

###### Proof\.

The top element has measure11, so it lies inℱμ\\mathcal\{F\}\_\{\\mu\}\. The bottom element has measure0, so the filter is proper\. Ifa,b∈ℱμa,b\\in\\mathcal\{F\}\_\{\\mu\}, then

μ¯​\(a∧b\)=1\\bar\{\\mu\}\(a\\wedge b\)=1because the complement ofa∧ba\\wedge bis contained in the union of the complements ofaaandbb, both of measure0\. Ifa∈ℱμa\\in\\mathcal\{F\}\_\{\\mu\}anda≤ba\\leq b, thenμ¯​\(b\)=1\\bar\{\\mu\}\(b\)=1\. Thusℱμ\\mathcal\{F\}\_\{\\mu\}is a filter\.

Assume now that all Boolean values have probability0or11\. For everyaa, eitherμ¯​\(a\)=1\\bar\{\\mu\}\(a\)=1orμ¯​\(¬a\)=1\\bar\{\\mu\}\(\\neg a\)=1, becauseμ¯​\(¬a\)=1−μ¯​\(a\)\\bar\{\\mu\}\(\\neg a\)=1\-\\bar\{\\mu\}\(a\)\. Hence the filter decides every Boolean element, so it is an ultrafilter\. ∎

### 2\.6Stone\-space representation

The Boolean\-algebraic form has a canonical topological representation\. LetSℒS\_\{\\mathcal\{L\}\}denote the Stone space of completeℒ\\mathcal\{L\}\-theories\. For a sentenceφ\\varphi, write

φ^:=\{p∈Sℒ:φ∈p\}\.\\widehat\{\\varphi\}:=\\\{p\\in S\_\{\\mathcal\{L\}\}:\\varphi\\in p\\\}\.The setsφ^\\widehat\{\\varphi\}are clopen and form a basis for the Stone topology\.

###### Definition 2\.24\(Stone pushforward of a semantic presentation\)\.

For a measured model class\(𝒦,μ\)\(\\mathcal\{K\},\\mu\)define

π𝒦:𝒦→Sℒ,π𝒦​\(𝔐\):=Th⁡\(𝔐\)\.\\pi\_\{\\mathcal\{K\}\}:\\mathcal\{K\}\\to S\_\{\\mathcal\{L\}\},\\qquad\\pi\_\{\\mathcal\{K\}\}\(\\mathfrak\{M\}\):=\\operatorname\{Th\}\(\\mathfrak\{M\}\)\.The induced Stone measure is the pushforward

ν𝒦,μ:=\(π𝒦\)∗​μ\.\\nu\_\{\\mathcal\{K\},\\mu\}:=\(\\pi\_\{\\mathcal\{K\}\}\)\_\{\\ast\}\\mu\.For a contextual posteriorμλ,c\\mu\_\{\\lambda,c\}, writeνλ,c\\nu\_\{\\lambda,c\}for the corresponding Stone measure\.

###### Theorem 2\.25\(Stone representation of semantic confidence\)\.

For every sentenceφ\\varphi,

sλ,c​\(φ\)=νλ,c​\(φ^\)\.s\_\{\\lambda,c\}\(\\varphi\)=\\nu\_\{\\lambda,c\}\(\\widehat\{\\varphi\}\)\.Consequently, two contextual presentations induce the same semantic confidence function on all sentences if and only if they induce the same probability measure on the clopen algebra ofSℒS\_\{\\mathcal\{L\}\}\.

###### Proof\.

By definition ofπ𝒦\\pi\_\{\\mathcal\{K\}\},

π𝒦−1​\(φ^\)=\{𝔐∈𝒦:Th⁡\(𝔐\)∈φ^\}=\{𝔐∈𝒦:𝔐⊧φ\}=\[\[φ\]\]𝒦\.\\pi\_\{\\mathcal\{K\}\}^\{\-1\}\(\\widehat\{\\varphi\}\)=\\\{\\mathfrak\{M\}\\in\\mathcal\{K\}:\\operatorname\{Th\}\(\\mathfrak\{M\}\)\\in\\widehat\{\\varphi\}\\\}=\\\{\\mathfrak\{M\}\\in\\mathcal\{K\}:\\mathfrak\{M\}\\models\\varphi\\\}=\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\.Pushforward gives

νλ,c​\(φ^\)=μλ,c​\(π𝒦−1​\(φ^\)\)=μλ,c​\(\[\[φ\]\]𝒦\)=sλ,c​\(φ\)\.\\nu\_\{\\lambda,c\}\(\\widehat\{\\varphi\}\)=\\mu\_\{\\lambda,c\}\(\\pi\_\{\\mathcal\{K\}\}^\{\-1\}\(\\widehat\{\\varphi\}\)\)=\\mu\_\{\\lambda,c\}\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)=s\_\{\\lambda,c\}\(\\varphi\)\.If two Stone measures agree on all clopens, then the equality above gives identical confidences for all sentences\. Conversely, equality of all confidences gives equality of the two measures on every basic clopenφ^\\widehat\{\\varphi\}and hence on the clopen algebra\. ∎

###### Corollary 2\.26\(Elementary\-equivalence invariance\)\.

Semantic confidence, contextual theory, threshold membership, and limit\-theory membership depend only on the distribution of complete theories, not on the particular representatives in𝒦\\mathcal\{K\}\.

###### Proof\.

All four notions are functions of valuessλ,c​\(φ\)s\_\{\\lambda,c\}\(\\varphi\)\. By Theorem[2\.25](https://arxiv.org/html/2606.07623#S2.Thmtheorem25), these values are exactly the Stone\-measure values of clopensφ^\\widehat\{\\varphi\}\. Replacing structures by elementarily equivalent representatives does not change their image inSℒS\_\{\\mathcal\{L\}\}, so it does not change the induced clopen probabilities\. ∎

This representation is useful for later comparison across scales\. A scale trend is a sequence of probability measures on a common Stone space, and a latent entailment is convergence of these measures to one on a clopen\. This gives a representation\-invariant target for semantic emergence\.

### 2\.7Support and indistinguishability

A sentence can have confidence one without being true in every admissible structure\. The right exact object is the support modulo null definable events\.

###### Definition 2\.27\(Null equivalence\)\.

For structures𝔐,𝔑∈𝒦\\mathfrak\{M\},\\mathfrak\{N\}\\in\\mathcal\{K\}, define

𝔐≡μ𝔑\\mathfrak\{M\}\\equiv\_\{\\mu\}\\mathfrak\{N\}when for every sentenceφ\\varphi, if𝔐⊧φ\\mathfrak\{M\}\\models\\varphiand𝔑⊧¬φ\\mathfrak\{N\}\\models\\neg\\varphi, then neither\[\[φ\]\]𝒦\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}nor\[\[¬φ\]\]𝒦\[\\\!\[\\neg\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}is forced by the measure\-one filter\. Equivalently,𝔐\\mathfrak\{M\}and𝔑\\mathfrak\{N\}are not separated by any sentence whose truth value is decided almost surely\.

###### Proposition 2\.28\(Almost\-sure theory is invariant under null indistinguishability\)\.

If𝔐≡μ𝔑\\mathfrak\{M\}\\equiv\_\{\\mu\}\\mathfrak\{N\}, then𝔐\\mathfrak\{M\}and𝔑\\mathfrak\{N\}satisfy the same sentences inThμ:=\{φ:μ​\(\[\[φ\]\]𝒦\)=1\}\\operatorname\{Th\}\_\{\\mu\}:=\\\{\\varphi:\\mu\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)=1\\\}\.

###### Proof\.

Supposeφ∈Thμ\\varphi\\in\\operatorname\{Th\}\_\{\\mu\}and𝔐⊧φ\\mathfrak\{M\}\\models\\varphi\. If𝔑⊧¬φ\\mathfrak\{N\}\\models\\neg\\varphi, thenφ\\varphiseparates𝔐\\mathfrak\{M\}and𝔑\\mathfrak\{N\}by a sentence decided almost surely, contradicting𝔐≡μ𝔑\\mathfrak\{M\}\\equiv\_\{\\mu\}\\mathfrak\{N\}\. Hence𝔑⊧φ\\mathfrak\{N\}\\models\\varphi\. The reverse direction is identical\. ∎

This quotient view is useful because LLM behavior is rarely determined by one canonical structure\. Many latent structures may be behaviorally indistinguishable under the formulas and contexts being tested\.

### 2\.8Definitional invariance

A semantic theory for LLM behavior should not depend on accidental notation\. If the language is expanded only by definitional abbreviations, the semantic confidence of old\-language claims should not change\. This is the first representation\-invariance test for the construction\.

###### Definition 2\.29\(Conservative definitional expansion\)\.

Letℒ⊆ℒ′\\mathcal\{L\}\\subseteq\\mathcal\{L\}^\{\\prime\}and letT′T^\{\\prime\}be a theory inℒ′\\mathcal\{L\}^\{\\prime\}\. We say thatT′T^\{\\prime\}is a conservative definitional expansion ofTTover a model class𝒦\\mathcal\{K\}if every𝔐∈𝒦\\mathfrak\{M\}\\in\\mathcal\{K\}has a unique expansion𝔐′⊧T′\\mathfrak\{M\}^\{\\prime\}\\models T^\{\\prime\}toℒ′\\mathcal\{L\}^\{\\prime\}and, for everyℒ\\mathcal\{L\}\-sentenceφ\\varphi,

𝔐⊧φ⟺𝔐′⊧φ\.\\mathfrak\{M\}\\models\\varphi\\qquad\\Longleftrightarrow\\qquad\\mathfrak\{M\}^\{\\prime\}\\models\\varphi\.Lete:𝒦→𝒦′e:\\mathcal\{K\}\\to\\mathcal\{K\}^\{\\prime\}send𝔐\\mathfrak\{M\}to its unique expansion\.

###### Theorem 2\.30\(Invariance under definitional expansion\)\.

Let\(𝒦,μ\)\(\\mathcal\{K\},\\mu\)be a measuredℒ\\mathcal\{L\}\-model class and let\(𝒦′,μ′\)\(\\mathcal\{K\}^\{\\prime\},\\mu^\{\\prime\}\)be obtained by a conservative definitional expansion, withμ′=e∗​μ\\mu^\{\\prime\}=e\_\{\\ast\}\\mu\. Then for everyℒ\\mathcal\{L\}\-sentenceφ\\varphi,

μ​\(\[\[φ\]\]𝒦\)=μ′​\(\[\[φ\]\]𝒦′\)\.\\mu\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)=\\mu^\{\\prime\}\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}^\{\\prime\}\}\)\.Consequently, almost\-sure theories, prompt consequences expressed inℒ\\mathcal\{L\}, and threshold membership forℒ\\mathcal\{L\}\-sentences are invariant under such expansions\.

###### Proof\.

By the definition of conservative expansion,

e−1​\(\[\[φ\]\]𝒦′\)=\[\[φ\]\]𝒦e^\{\-1\}\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}^\{\\prime\}\}\)=\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}for every old\-language sentenceφ\\varphi\. Sinceμ′=e∗​μ\\mu^\{\\prime\}=e\_\{\\ast\}\\mu,

μ′​\(\[\[φ\]\]𝒦′\)=μ​\(e−1​\(\[\[φ\]\]𝒦′\)\)=μ​\(\[\[φ\]\]𝒦\)\.\\mu^\{\\prime\}\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}^\{\\prime\}\}\)=\\mu\(e^\{\-1\}\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}^\{\\prime\}\}\)\)=\\mu\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)\.The conclusions about almost\-sure theories and threshold membership follow directly from equality of confidence values\. Prompt consequence is defined by satisfaction of the same old\-language sentences on corresponding selected structures, so it is also preserved\. ∎

## 3Prompts as preferential context updates

A prompt has literal content and pragmatic force\. Literal content behaves like constraints\. Pragmatic force ranks models that satisfy those constraints\. This is the same structural division that appears in default logic and preferential consequence, although the object being updated here is a measured class of semantic structures\. Reiter’s default logic gives one classical formalization of defaults\[[24](https://arxiv.org/html/2606.07623#bib.bib24)\]\. Preferential semantics and cumulative consequence relations give another\[[15](https://arxiv.org/html/2606.07623#bib.bib15)\]\. Belief revision separates hard information from revision policy in a related way\[[2](https://arxiv.org/html/2606.07623#bib.bib2)\]\.

###### Definition 3\.1\(Prompt specification\)\.

LetX⊆𝒦λX\\subseteq\\mathcal\{K\}\_\{\\lambda\}\. A*prompt specification*onXXis a pair

p=\(Hp,ρp\),p=\(H\_\{p\},\\rho\_\{p\}\),whereHp⊆Sent⁡\(ℒ\)H\_\{p\}\\subseteq\\operatorname\{Sent\}\(\\mathcal\{L\}\)is finite andρp:X→Γp\\rho\_\{p\}:X\\to\\Gamma\_\{p\}maps structures into a well\-ordered set\. Lower rank means greater pragmatic preference\. Define

𝖴𝗉𝖽p​\(X\):=Minρp⁡\(X∩Mod⁡\(Hp\)\),\\mathsf\{Upd\}\_\{p\}\(X\):=\\operatorname\{Min\}\_\{\\rho\_\{p\}\}\(X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\),where

Minρp⁡\(A\):=\{𝔐∈A:\(∀𝔑∈A\)​ρp​\(𝔐\)≤ρp​\(𝔑\)\}\.\\operatorname\{Min\}\_\{\\rho\_\{p\}\}\(A\):=\\\{\\mathfrak\{M\}\\in A:\(\\forall\\mathfrak\{N\}\\in A\)\\ \\rho\_\{p\}\(\\mathfrak\{M\}\)\\leq\\rho\_\{p\}\(\\mathfrak\{N\}\)\\\}\.If𝖴𝗉𝖽p​\(X\)≠∅\\mathsf\{Upd\}\_\{p\}\(X\)\\neq\\varnothing, define prompt consequence by

p⊩Xφ⟺\(∀𝔐∈𝖴𝗉𝖽p​\(X\)\)​𝔐⊧φ\.p\\Vdash\_\{X\}\\varphi\\quad\\Longleftrightarrow\\quad\(\\forall\\mathfrak\{M\}\\in\\mathsf\{Upd\}\_\{p\}\(X\)\)\\ \\mathfrak\{M\}\\models\\varphi\.

###### Proposition 3\.2\(Elementary algebra of prompt update\)\.

AssumeX∩Mod⁡\(Hp\)≠∅X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\\neq\\varnothing\. Then:

1. \(i\)𝖴𝗉𝖽p​\(X\)⊆X∩Mod⁡\(Hp\)\\mathsf\{Upd\}\_\{p\}\(X\)\\subseteq X\\cap\\operatorname\{Mod\}\(H\_\{p\}\);
2. \(ii\)𝖴𝗉𝖽p​\(𝖴𝗉𝖽p​\(X\)\)=𝖴𝗉𝖽p​\(X\)\\mathsf\{Upd\}\_\{p\}\(\\mathsf\{Upd\}\_\{p\}\(X\)\)=\\mathsf\{Upd\}\_\{p\}\(X\);
3. \(iii\)ifρp\\rho\_\{p\}is constant onX∩Mod⁡\(Hp\)X\\cap\\operatorname\{Mod\}\(H\_\{p\}\), then 𝖴𝗉𝖽p​\(X\)=X∩Mod⁡\(Hp\)\.\\mathsf\{Upd\}\_\{p\}\(X\)=X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\.

###### Proof\.

The first item is immediate from the definition\. For the second, putA=X∩Mod⁡\(Hp\)A=X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\. All elements ofMinρp⁡\(A\)\\operatorname\{Min\}\_\{\\rho\_\{p\}\}\(A\)have the same minimal rank inAA\. ApplyingMinρp\\operatorname\{Min\}\_\{\\rho\_\{p\}\}again to that set therefore changes nothing\. For the third, if the rank is constant onAA, every element ofAAis minimal\. ∎

The constant\-rank case recovers literal context restriction\. The general case allows an instruction to prefer one interpretation over another even when both satisfy the literal constraints\.

###### Definition 3\.3\(Lexicographic composition\)\.

Given promptsp=\(Hp,ρp\)p=\(H\_\{p\},\\rho\_\{p\}\)andq=\(Hq,ρq\)q=\(H\_\{q\},\\rho\_\{q\}\)onXX, their lexicographic composition is

p⊕q=\(Hp∪Hq,ρp⊕q\),p\\oplus q=\(H\_\{p\}\\cup H\_\{q\},\\rho\_\{p\\oplus q\}\),where

ρp⊕q​\(𝔐\):=\(ρq​\(𝔐\),ρp​\(𝔐\)\)\\rho\_\{p\\oplus q\}\(\\mathfrak\{M\}\):=\(\\rho\_\{q\}\(\\mathfrak\{M\}\),\\rho\_\{p\}\(\\mathfrak\{M\}\)\)ordered lexicographically\. Thus later pragmatic force has priority, while hard constraints accumulate\.

Other priority conventions are possible\. The lexicographic convention is useful because it captures a common instruction\-following pattern: a later style or role instruction may dominate an earlier default, provided the hard content remains satisfiable\.

###### Theorem 3\.4\(Prompt consequence is nonmonotonic\)\.

SupposeXXcontains𝔐\\mathfrak\{M\}and𝔑\\mathfrak\{N\}, and there are promptsp,qp,qand a sentenceφ\\varphisuch that:

1. \(a\)𝔐,𝔑∈X∩Mod⁡\(Hp∪Hq\)\\mathfrak\{M\},\\mathfrak\{N\}\\in X\\cap\\operatorname\{Mod\}\(H\_\{p\}\\cup H\_\{q\}\);
2. \(b\)𝖴𝗉𝖽p​\(X\)=\{𝔐\}\\mathsf\{Upd\}\_\{p\}\(X\)=\\\{\\mathfrak\{M\}\\\};
3. \(c\)𝖴𝗉𝖽p⊕q​\(X\)=\{𝔑\}\\mathsf\{Upd\}\_\{p\\oplus q\}\(X\)=\\\{\\mathfrak\{N\}\\\};
4. \(d\)𝔐⊧φ\\mathfrak\{M\}\\models\\varphiand𝔑⊧¬φ\\mathfrak\{N\}\\models\\neg\\varphi\.

Then

p⊩Xφandp⊕q⊮Xφ\.p\\Vdash\_\{X\}\\varphi\\qquad\\text\{and\}\\qquad p\\oplus q\\not\\Vdash\_\{X\}\\varphi\.

###### Proof\.

Since𝖴𝗉𝖽p​\(X\)=\{𝔐\}\\mathsf\{Upd\}\_\{p\}\(X\)=\\\{\\mathfrak\{M\}\\\}and𝔐⊧φ\\mathfrak\{M\}\\models\\varphi, one hasp⊩Xφp\\Vdash\_\{X\}\\varphi\. Since𝖴𝗉𝖽p⊕q​\(X\)=\{𝔑\}\\mathsf\{Upd\}\_\{p\\oplus q\}\(X\)=\\\{\\mathfrak\{N\}\\\}and𝔑⊧¬φ\\mathfrak\{N\}\\models\\neg\\varphi, not every structure selected byp⊕qp\\oplus qsatisfiesφ\\varphi\. Hencep⊕q⊮Xφp\\oplus q\\not\\Vdash\_\{X\}\\varphi\. ∎

This is not ordinary monotone theory extension\. Hard information can be monotone while pragmatic preference is not\.

### 3\.1Fixed\-preference closure

Nonmonotonicity does not mean absence of structure\. If the ranking is fixed and only hard constraints vary, prompt consequence satisfies controlled closure laws\.

###### Definition 3\.5\(Ranked consequence under fixed preference\)\.

FixXXand a rankingρ:X→Γ\\rho:X\\to\\Gamma\. For a finiteH⊆Sent⁡\(ℒ\)H\\subseteq\\operatorname\{Sent\}\(\\mathcal\{L\}\)define

H⊩X,ρφ⟺Minρ⁡\(X∩Mod⁡\(H\)\)⊆\[\[φ\]\]X\.H\\Vdash\_\{X,\\rho\}\\varphi\\quad\\Longleftrightarrow\\quad\\operatorname\{Min\}\_\{\\rho\}\(X\\cap\\operatorname\{Mod\}\(H\)\)\\subseteq\[\\\!\[\\varphi\]\\\!\]\_\{X\}\.

###### Proposition 3\.6\(Right weakening and conjunction\)\.

For fixedXXandρ\\rho:

1. \(i\)ifH⊩X,ρφH\\Vdash\_\{X,\\rho\}\\varphiandφ⊧ψ\\varphi\\models\\psi, thenH⊩X,ρψH\\Vdash\_\{X,\\rho\}\\psi;
2. \(ii\)ifH⊩X,ρφH\\Vdash\_\{X,\\rho\}\\varphiandH⊩X,ρψH\\Vdash\_\{X,\\rho\}\\psi, thenH⊩X,ρφ∧ψH\\Vdash\_\{X,\\rho\}\\varphi\\wedge\\psi\.

###### Proof\.

Both claims follow by inclusion\. In \(i\),\[\[φ\]\]X⊆\[\[ψ\]\]X\[\\\!\[\\varphi\]\\\!\]\_\{X\}\\subseteq\[\\\!\[\\psi\]\\\!\]\_\{X\}\. In \(ii\), the selected set is included in both\[\[φ\]\]X\[\\\!\[\\varphi\]\\\!\]\_\{X\}and\[\[ψ\]\]X\[\\\!\[\\psi\]\\\!\]\_\{X\}, hence in their intersection\. ∎

###### Theorem 3\.7\(Cautious monotony under stable minima\)\.

LetHHbe finite\. If

H⊩X,ρφH\\Vdash\_\{X,\\rho\}\\varphiand

Minρ⁡\(X∩Mod⁡\(H∪\{φ\}\)\)=Minρ⁡\(X∩Mod⁡\(H\)\),\\operatorname\{Min\}\_\{\\rho\}\(X\\cap\\operatorname\{Mod\}\(H\\cup\\\{\\varphi\\\}\)\)=\\operatorname\{Min\}\_\{\\rho\}\(X\\cap\\operatorname\{Mod\}\(H\)\),then for everyψ\\psi,

H⊩X,ρψ⟺H∪\{φ\}⊩X,ρψ\.H\\Vdash\_\{X,\\rho\}\\psi\\quad\\Longleftrightarrow\\quad H\\cup\\\{\\varphi\\\}\\Vdash\_\{X,\\rho\}\\psi\.

###### Proof\.

Under the stated equality, the selected model set forHHis exactly the selected model set forH∪\{φ\}H\\cup\\\{\\varphi\\\}\. The two consequence relations therefore quantify over the same structures\. ∎

The hypothesis says that adding a consequence already true in all preferred models does not disturb the set of preferred models\. In prompt terms, a restatement of what the current preferred interpretation already satisfies is harmless\. A later instruction is dangerous only when it changes the preference order or removes all current minima\.

###### Definition 3\.8\(Theory of a selected set\)\.

ForS⊆XS\\subseteq X, define

ThX⁡\(S\):=\{φ∈Sent⁡\(ℒ\):S⊆\[\[φ\]\]X\}\.\\operatorname\{Th\}\_\{X\}\(S\):=\\\{\\varphi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\):S\\subseteq\[\\\!\[\\varphi\]\\\!\]\_\{X\}\\\}\.

###### Theorem 3\.9\(Exact preservation criterion for prompt extension\)\.

Assume𝖴𝗉𝖽p​\(X\)\\mathsf\{Upd\}\_\{p\}\(X\)and𝖴𝗉𝖽p⊕q​\(X\)\\mathsf\{Upd\}\_\{p\\oplus q\}\(X\)are nonempty\. Then the following are equivalent:

1. \(i\)everypp\-consequence is preserved after appendingqq, that is, p⊩Xφ⟹p⊕q⊩Xφp\\Vdash\_\{X\}\\varphi\\quad\\Longrightarrow\\quad p\\oplus q\\Vdash\_\{X\}\\varphifor all sentencesφ\\varphi;
2. \(ii\)ThX⁡\(𝖴𝗉𝖽p​\(X\)\)⊆ThX⁡\(𝖴𝗉𝖽p⊕q​\(X\)\);\\operatorname\{Th\}\_\{X\}\(\\mathsf\{Upd\}\_\{p\}\(X\)\)\\subseteq\\operatorname\{Th\}\_\{X\}\(\\mathsf\{Upd\}\_\{p\\oplus q\}\(X\)\);
3. \(iii\)every sentence true in allpp\-selected structures is true in all\(p⊕q\)\(p\\oplus q\)\-selected structures\.

If𝖴𝗉𝖽p​\(X\)\\mathsf\{Upd\}\_\{p\}\(X\)is finite and the language separates points ofXX, then these equivalent conditions imply

𝖴𝗉𝖽p⊕q​\(X\)⊆𝖴𝗉𝖽p​\(X\)\.\\mathsf\{Upd\}\_\{p\\oplus q\}\(X\)\\subseteq\\mathsf\{Upd\}\_\{p\}\(X\)\.

###### Proof\.

The equivalence of \(i\), \(ii\), and \(iii\) is just the definition of skeptical consequence over selected sets\. For the final claim, suppose𝖴𝗉𝖽p​\(X\)=\{𝔐1,…,𝔐m\}\\mathsf\{Upd\}\_\{p\}\(X\)=\\\{\\mathfrak\{M\}\_\{1\},\\ldots,\\mathfrak\{M\}\_\{m\}\\\}and take𝔑∈𝖴𝗉𝖽p⊕q​\(X\)\\mathfrak\{N\}\\in\\mathsf\{Upd\}\_\{p\\oplus q\}\(X\)\. If𝔑∉𝖴𝗉𝖽p​\(X\)\\mathfrak\{N\}\\notin\\mathsf\{Upd\}\_\{p\}\(X\), then for eachiithere is a sentenceθi\\theta\_\{i\}with𝔐i⊧θi\\mathfrak\{M\}\_\{i\}\\models\\theta\_\{i\}and𝔑⊧¬θi\\mathfrak\{N\}\\models\\neg\\theta\_\{i\}, after negating the separator if necessary\. The conjunctionθ=⋀iθi\\theta=\\bigwedge\_\{i\}\\theta\_\{i\}is true on allpp\-selected structures and false at𝔑\\mathfrak\{N\}\. Henceθ∈ThX⁡\(𝖴𝗉𝖽p​\(X\)\)\\theta\\in\\operatorname\{Th\}\_\{X\}\(\\mathsf\{Upd\}\_\{p\}\(X\)\)butθ∉ThX⁡\(𝖴𝗉𝖽p⊕q​\(X\)\)\\theta\\notin\\operatorname\{Th\}\_\{X\}\(\\mathsf\{Upd\}\_\{p\\oplus q\}\(X\)\), contradicting \(iii\)\. Thus every\(p⊕q\)\(p\\oplus q\)\-selected structure lies in𝖴𝗉𝖽p​\(X\)\\mathsf\{Upd\}\_\{p\}\(X\)\. ∎

This theorem turns a vague instruction\-following question into a semantic test\. A new instruction is safe exactly when it does not move the selected structures outside the old theory\. If the relevant selected set is finite and the language distinguishes its points, safety is equivalent to containment of selected sets\.

### 3\.2Prompt update as posterior restriction

The previous definitions are set\-theoretic\. They connect to the measured presentation by conditioning and preference weighting\.

###### Definition 3\.10\(Soft preferential posterior\)\.

Letp=\(Hp,ρp\)p=\(H\_\{p\},\\rho\_\{p\}\)be a prompt and letβ\>0\\beta\>0\. Define the unnormalized weight

wpβ​\(𝔐\):=𝟏𝔐⊧Hp​exp⁡\(−β​rp​\(𝔐\)\),w\_\{p\}^\{\\beta\}\(\\mathfrak\{M\}\):=\\mathbf\{1\}\_\{\\mathfrak\{M\}\\models H\_\{p\}\}\\exp\(\-\\beta r\_\{p\}\(\\mathfrak\{M\}\)\),whererp:X→ℝ≥0r\_\{p\}:X\\to\\mathbb\{R\}\_\{\\geq 0\}is a real\-valued representation of the ranking\. If

Zpβ:=∫Xwpβ​\(𝔐\)​𝑑μ​\(𝔐\)\>0,Z\_\{p\}^\{\\beta\}:=\\int\_\{X\}w\_\{p\}^\{\\beta\}\(\\mathfrak\{M\}\)\\,d\\mu\(\\mathfrak\{M\}\)\>0,then the soft prompt posterior is

d​μpβ​\(𝔐\):=wpβ​\(𝔐\)Zpβ​d​μ​\(𝔐\)\.d\\mu\_\{p\}^\{\\beta\}\(\\mathfrak\{M\}\):=\\frac\{w\_\{p\}^\{\\beta\}\(\\mathfrak\{M\}\)\}\{Z\_\{p\}^\{\\beta\}\}\\,d\\mu\(\\mathfrak\{M\}\)\.

###### Proposition 3\.11\(Zero\-temperature limit\)\.

AssumeXXis finite andrpr\_\{p\}attains its minimum overX∩Mod⁡\(Hp\)X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\. Let

A0:=arg​min⁡\{rp​\(𝔐\):𝔐∈X∩Mod⁡\(Hp\)\}\.A\_\{0\}:=\\operatorname\*\{arg\\,min\}\\\{r\_\{p\}\(\\mathfrak\{M\}\):\\mathfrak\{M\}\\in X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\\\}\.Ifμ​\(𝔐\)\>0\\mu\(\\mathfrak\{M\}\)\>0for every𝔐∈X\\mathfrak\{M\}\\in X, then for everyB⊆XB\\subseteq X,

limβ→∞μpβ​\(B\)=μ​\(B∩A0\)μ​\(A0\)\.\\lim\_\{\\beta\\to\\infty\}\\mu\_\{p\}^\{\\beta\}\(B\)=\\frac\{\\mu\(B\\cap A\_\{0\}\)\}\{\\mu\(A\_\{0\}\)\}\.

###### Proof\.

Letmmbe the minimum ofrpr\_\{p\}onX∩Mod⁡\(Hp\)X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\. Then

Zpβ=e−β​m​μ​\(A0\)\+∑𝔐∈\(X∩Mod⁡\(Hp\)\)∖A0e−β​rp​\(𝔐\)​μ​\(𝔐\)\.Z\_\{p\}^\{\\beta\}=e^\{\-\\beta m\}\\mu\(A\_\{0\}\)\+\\sum\_\{\\mathfrak\{M\}\\in\(X\\cap\\operatorname\{Mod\}\(H\_\{p\}\)\)\\setminus A\_\{0\}\}e^\{\-\\beta r\_\{p\}\(\\mathfrak\{M\}\)\}\\mu\(\\mathfrak\{M\}\)\.After dividing numerator and denominator bye−β​me^\{\-\\beta m\}, every term withrp​\(𝔐\)\>mr\_\{p\}\(\\mathfrak\{M\}\)\>mtends to0\. The remaining mass is exactly the prior mass restricted toA0A\_\{0\}and normalized\. ∎

Thus hard preferential update is the zero\-temperature limit of a familiar probabilistic selection rule\.

### 3\.3Logical cost models for prompt conflict

Prompt instructions may be mutually satisfiable as hard constraints while still competing pragmatically\. A useful formal abstraction is to assign costs to violations and then recover preferred models by minimization\.

###### Definition 3\.12\(Violation cost\)\.

LetH=\{η1,…,ηm\}H=\\\{\\eta\_\{1\},\\ldots,\\eta\_\{m\}\\\}be a finite instruction set and letwi\>0w\_\{i\}\>0\. The violation cost of a structure𝔐∈X\\mathfrak\{M\}\\in Xis

ℓH​\(𝔐\):=∑i:𝔐⊧¬ηiwi\.\\ell\_\{H\}\(\\mathfrak\{M\}\):=\\sum\_\{i:\\,\\mathfrak\{M\}\\models\\neg\\eta\_\{i\}\}w\_\{i\}\.The induced update is

𝖢𝗈𝗌𝗍𝖴𝗉𝖽H​\(X\):=arg​min𝔐∈X⁡ℓH​\(𝔐\)\.\\mathsf\{CostUpd\}\_\{H\}\(X\):=\\operatorname\*\{arg\\,min\}\_\{\\mathfrak\{M\}\\in X\}\\ell\_\{H\}\(\\mathfrak\{M\}\)\.

###### Proposition 3\.13\(Hard constraints as infinite\-cost limit\)\.

LetH=H0∪H1H=H\_\{0\}\\cup H\_\{1\}, whereH0H\_\{0\}is treated as hard andH1H\_\{1\}as soft\. Assign weightBBto every formula inH0H\_\{0\}and fixed finite positive weights to formulas inH1H\_\{1\}\. IfX∩Mod⁡\(H0\)≠∅X\\cap\\operatorname\{Mod\}\(H\_\{0\}\)\\neq\\varnothing, then for all sufficiently largeBBevery element of𝖢𝗈𝗌𝗍𝖴𝗉𝖽H​\(X\)\\mathsf\{CostUpd\}\_\{H\}\(X\)satisfiesH0H\_\{0\}\.

###### Proof\.

BecauseH1H\_\{1\}is finite and has fixed weights, the maximum possible soft cost is some finite numberCC\. Any model satisfyingH0H\_\{0\}has hard cost0and total cost at mostCC\. Any model violating at least one hard formula has hard cost at leastBB\. ForB\>CB\>C, no hard\-violating model can be cost\-minimal\. ∎

###### Theorem 3\.14\(Cost update agrees with preferential update\)\.

AssumeXXis finite\. For every real\-valued rankingr:X→ℝr:X\\to\\mathbb\{R\}there exists a cost functionℓ:X→ℝ≥0\\ell:X\\to\\mathbb\{R\}\_\{\\geq 0\}such that

arg​minX⁡r=arg​minX⁡ℓ\.\\operatorname\*\{arg\\,min\}\_\{X\}r=\\operatorname\*\{arg\\,min\}\_\{X\}\\ell\.Conversely, every cost update is a preferential update for the rankingρ​\(𝔐\)=ℓ​\(𝔐\)\\rho\(\\mathfrak\{M\}\)=\\ell\(\\mathfrak\{M\}\)\.

###### Proof\.

For the first direction, choosem=minX⁡rm=\\min\_\{X\}rand setℓ​\(𝔐\)=r​\(𝔐\)−m\\ell\(\\mathfrak\{M\}\)=r\(\\mathfrak\{M\}\)\-m\. Thenℓ≥0\\ell\\geq 0and the minimizers are exactly the minimizers ofrr\. The converse is the definition of preferential update with rank equal to cost\. ∎

This shows that weighted\-instruction semantics and ranked\-model semantics are two presentations of the same selection principle on finite model classes\.

## 4In\-context learning as semantic model expansion

In\-context learning can be formalized without assuming that parameters change\. The semantic state changes because the context extends the language with observed examples and adds the finite diagram of those examples\. The direct\-limit theory then represents what the entire example stream would determine\.

###### Definition 4\.1\(Example\-driven expansion chain\)\.

Letℒ0\\mathcal\{L\}\_\{0\}contain sorts𝐗\\mathbf\{X\}and𝐘\\mathbf\{Y\}and a function symbolf:𝐗→𝐘f:\\mathbf\{X\}\\to\\mathbf\{Y\}\. LetT0⊆Sent⁡\(ℒ0\)T\_\{0\}\\subseteq\\operatorname\{Sent\}\(\\mathcal\{L\}\_\{0\}\)be consistent\. Given an example stream

E=\(en\)n∈ℕ=\(\(an,bn\)\)n∈ℕ,E=\(e\_\{n\}\)\_\{n\\in\\mathbb\{N\}\}=\(\(a\_\{n\},b\_\{n\}\)\)\_\{n\\in\\mathbb\{N\}\},whereana\_\{n\}andbnb\_\{n\}are represented by closed terms of the appropriate sorts, define recursively

ℒn\\displaystyle\\mathcal\{L\}\_\{n\}:=ℒn−1∪\{cn,dn\},\\displaystyle:=\\mathcal\{L\}\_\{n\-1\}\\cup\\\{c\_\{n\},d\_\{n\}\\\},Δn\\displaystyle\\Delta\_\{n\}:=\{cn=an,dn=bn,f​\(cn\)=dn\},\\displaystyle:=\\\{c\_\{n\}=a\_\{n\},\\ d\_\{n\}=b\_\{n\},\\ f\(c\_\{n\}\)=d\_\{n\}\\\},Tn\\displaystyle T\_\{n\}:=Cnℒn⁡\(Tn−1↑∪Δn\),\\displaystyle:=\\operatorname\{Cn\}\_\{\\mathcal\{L\}\_\{n\}\}\(T\_\{n\-1\}^\{\\uparrow\}\\cup\\Delta\_\{n\}\),𝒦n\\displaystyle\\mathcal\{K\}\_\{n\}:=Mod⁡\(Tn\)\.\\displaystyle:=\\operatorname\{Mod\}\(T\_\{n\}\)\.The chain\(ℒn,Tn,𝒦n\)n∈ℕ\(\\mathcal\{L\}\_\{n\},T\_\{n\},\\mathcal\{K\}\_\{n\}\)\_\{n\\in\\mathbb\{N\}\}is the semantic expansion chain generated byEE\.

###### Proposition 4\.2\(Monotone expansion of information\)\.

For everynn,

Tn↑⊆Tn\+1\.T\_\{n\}^\{\\uparrow\}\\subseteq T\_\{n\+1\}\.Consequently, if𝔐⊧Tn\+1\\mathfrak\{M\}\\models T\_\{n\+1\}, then theℒn\\mathcal\{L\}\_\{n\}\-reduct𝔐↾ℒn\\mathfrak\{M\}\\upharpoonright\\mathcal\{L\}\_\{n\}is a model ofTnT\_\{n\}\.

###### Proof\.

By construction,

Tn\+1=Cnℒn\+1⁡\(Tn↑∪Δn\+1\)\.T\_\{n\+1\}=\\operatorname\{Cn\}\_\{\\mathcal\{L\}\_\{n\+1\}\}\(T\_\{n\}^\{\\uparrow\}\\cup\\Delta\_\{n\+1\}\)\.A consequence set contains its premises\. The reduct statement follows because theℒn\\mathcal\{L\}\_\{n\}\-sentences true in𝔐\\mathfrak\{M\}are exactly theℒn\\mathcal\{L\}\_\{n\}\-sentences true in the reduct\. ∎

###### Definition 4\.3\(Admissible answer sets\)\.

Letqqbe a closedℒ0\\mathcal\{L\}\_\{0\}\-term of sort𝐗\\mathbf\{X\}and let𝒞𝐘\\mathcal\{C\}\_\{\\mathbf\{Y\}\}be a fixed set of closed terms of sort𝐘\\mathbf\{Y\}\. Define

Ansn​\(q\):=\{t∈𝒞𝐘:Tn∪\{f​\(q\)=t\}​is satisfiable\}\.\\mathrm\{Ans\}\_\{n\}\(q\):=\\\{t\\in\\mathcal\{C\}\_\{\\mathbf\{Y\}\}:T\_\{n\}\\cup\\\{f\(q\)=t\\\}\\text\{ is satisfiable\}\\\}\.

###### Proposition 4\.4\(Monotone narrowing of admissible answers\)\.

For every queryqqand everynn,

Ansn\+1​\(q\)⊆Ansn​\(q\)\.\\mathrm\{Ans\}\_\{n\+1\}\(q\)\\subseteq\\mathrm\{Ans\}\_\{n\}\(q\)\.

###### Proof\.

Ift∈Ansn\+1​\(q\)t\\in\\mathrm\{Ans\}\_\{n\+1\}\(q\), choose𝔐⊧Tn\+1∪\{f​\(q\)=t\}\\mathfrak\{M\}\\models T\_\{n\+1\}\\cup\\\{f\(q\)=t\\\}\. By Proposition[4\.2](https://arxiv.org/html/2606.07623#S4.Thmtheorem2), theℒn\\mathcal\{L\}\_\{n\}\-reduct of𝔐\\mathfrak\{M\}satisfiesTnT\_\{n\}\. Sincef,q,tf,q,tare already inℒn\\mathcal\{L\}\_\{n\}, the reduct also satisfiesf​\(q\)=tf\(q\)=t\. Hencet∈Ansn​\(q\)t\\in\\mathrm\{Ans\}\_\{n\}\(q\)\. ∎

### 4\.1Types and answer isolation

The shrinking of answer sets can be restated as type isolation\. This formulation is useful because it expresses what examples do semantically: they reduce the space of complete types compatible with the context\.

###### Definition 4\.5\(Contextual query type\)\.

Letxxbe a variable of sort𝐗\\mathbf\{X\}\. Thenn\-stage type of a query termqqis

tpn⁡\(q\):=\{θ​\(x\)∈ℒn:Tn⊧θ​\(q\)\}\.\\operatorname\{tp\}\_\{n\}\(q\):=\\\{\\theta\(x\)\\in\\mathcal\{L\}\_\{n\}:T\_\{n\}\\models\\theta\(q\)\\\}\.For an answer termtt, say thattpn⁡\(q\)\\operatorname\{tp\}\_\{n\}\(q\)*isolates*ttif

Tn⊧f​\(q\)=t\.T\_\{n\}\\models f\(q\)=t\.

###### Lemma 4\.6\(Isolation equals singleton admissibility\)\.

Assume distinct terms in𝒞𝐘\\mathcal\{C\}\_\{\\mathbf\{Y\}\}denote distinct candidate answers in all models ofTnT\_\{n\}\. Thentpn⁡\(q\)\\operatorname\{tp\}\_\{n\}\(q\)isolatesttif and only if

Ansn​\(q\)=\{t\}\.\\mathrm\{Ans\}\_\{n\}\(q\)=\\\{t\\\}\.

###### Proof\.

IfTn⊧f​\(q\)=tT\_\{n\}\\models f\(q\)=t, then any satisfiable extensionTn∪\{f​\(q\)=u\}T\_\{n\}\\cup\\\{f\(q\)=u\\\}forcesu=tu=tby distinctness of candidates\. Hence the answer set is\{t\}\\\{t\\\}, provided it is nonempty\. Nonemptiness follows from consistency ofTnT\_\{n\}and interpretation off​\(q\)f\(q\)among the candidate terms\. Conversely, ifAnsn​\(q\)=\{t\}\\mathrm\{Ans\}\_\{n\}\(q\)=\\\{t\\\}andTn⊧̸f​\(q\)=tT\_\{n\}\\not\\models f\(q\)=t, thenTn∪\{f​\(q\)≠t\}T\_\{n\}\\cup\\\{f\(q\)\\neq t\\\}is satisfiable\. In such a modelf​\(q\)f\(q\)must be denoted by some candidateu≠tu\\neq t, givingu∈Ansn​\(q\)u\\in\\mathrm\{Ans\}\_\{n\}\(q\), a contradiction\. ∎

###### Theorem 4\.7\(Finite stabilization by compactness\)\.

Let

Tω:=⋃n∈ℕTn\.T\_\{\\omega\}:=\\bigcup\_\{n\\in\\mathbb\{N\}\}T\_\{n\}\.Ifqqis a query term and

Tω⊧f​\(q\)=t,T\_\{\\omega\}\\models f\(q\)=t,then there existsN∈ℕN\\in\\mathbb\{N\}such that

TN⊧f​\(q\)=t\.T\_\{N\}\\models f\(q\)=t\.In particular, if the direct\-limit theory determines a unique answer toqq, some finite prefix of the context already determines it\.

###### Proof\.

The entailmentTω⊧f​\(q\)=tT\_\{\\omega\}\\models f\(q\)=tmeans that

Tω∪\{f​\(q\)≠t\}T\_\{\\omega\}\\cup\\\{f\(q\)\\neq t\\\}is inconsistent\. By compactness, some finiteΣ⊆Tω\\Sigma\\subseteq T\_\{\\omega\}already makes

Σ∪\{f​\(q\)≠t\}\\Sigma\\cup\\\{f\(q\)\\neq t\\\}inconsistent\. Since the chain is increasing, there existsNNwithΣ⊆TN\\Sigma\\subseteq T\_\{N\}\. ThusTN∪\{f​\(q\)≠t\}T\_\{N\}\\cup\\\{f\(q\)\\neq t\\\}is inconsistent, equivalentlyTN⊧f​\(q\)=tT\_\{N\}\\models f\(q\)=t\. ∎

The theorem is not a claim that every finite prompt determines every answer\. It says that any answer determined by the infinite semantic expansion has a finite certificate\. This is the compactness core behind finite in\-context evidence\.

###### Corollary 4\.8\(Finite certificate for answer exclusion\)\.

Ift∉Ansω​\(q\)t\\notin\\mathrm\{Ans\}\_\{\\omega\}\(q\), where

Ansω​\(q\):=\{u:Tω∪\{f​\(q\)=u\}​is satisfiable\},\\mathrm\{Ans\}\_\{\\omega\}\(q\):=\\\{u:T\_\{\\omega\}\\cup\\\{f\(q\)=u\\\}\\text\{ is satisfiable\}\\\},then there existsNNsuch thatt∉AnsN​\(q\)t\\notin\\mathrm\{Ans\}\_\{N\}\(q\)\.

###### Proof\.

The statementt∉Ansω​\(q\)t\\notin\\mathrm\{Ans\}\_\{\\omega\}\(q\)meansTω⊧f​\(q\)≠tT\_\{\\omega\}\\models f\(q\)\\neq t\. Apply Theorem[4\.7](https://arxiv.org/html/2606.07623#S4.Thmtheorem7)to the formulaf​\(q\)≠tf\(q\)\\neq t\. ∎

###### Proposition 4\.9\(Indistinguishable\-prefix lower bound\)\.

Letχ​\(f​\(q\)\)\\chi\(f\(q\)\)be a target property\. If there exist two models𝔐,𝔑⊧Tn\\mathfrak\{M\},\\mathfrak\{N\}\\models T\_\{n\}such that

𝔐⊧χ​\(f​\(q\)\)and𝔑⊧¬χ​\(f​\(q\)\),\\mathfrak\{M\}\\models\\chi\(f\(q\)\)\\qquad\\text\{and\}\\qquad\\mathfrak\{N\}\\models\\neg\\chi\(f\(q\)\),then no prefix of length at mostnncertifies the target\. In particular, the least certificate length is strictly larger thannn\.

###### Proof\.

If someTmT\_\{m\}withm≤nm\\leq nentailedχ​\(f​\(q\)\)\\chi\(f\(q\)\), thenTnT\_\{n\}would also entailχ​\(f​\(q\)\)\\chi\(f\(q\)\)by monotonicity of the expansion chain\. But𝔑⊧Tn∪\{¬χ​\(f​\(q\)\)\}\\mathfrak\{N\}\\models T\_\{n\}\\cup\\\{\\neg\\chi\(f\(q\)\)\\\}, contradicting entailment\. Hence no suchmmexists\. ∎

This lower bound is the model\-theoretic form of underdetermination in in\-context learning\. To prove that a context is too short, it is enough to build two extensions of the same finite prefix that agree with all observed examples but disagree on the queried property\.

### 4\.2Pair\-separator certificates and teaching dimension

Compactness proves that first\-order certificates are finite when they exist, but it does not say which examples are responsible\. In finite deterministic task families the responsible examples have an exact combinatorial description\.

###### Definition 4\.10\(Finite deterministic task family\)\.

LetHHbe a finite set of hypotheses, letXXbe a finite set of possible example inputs, letBBbe a finite output alphabet, and let

h:X∪\{q\}→Bh:X\\cup\\\{q\\\}\\to Bbe the response function associated with eachh∈Hh\\in H\. After observing a labeled example setE⊆XE\\subseteq X, the version space is

HE:=\{h∈H:\(∀x∈E\)​h​\(x\)=bx\},H\_\{E\}:=\\\{h\\in H:\(\\forall x\\in E\)\\ h\(x\)=b\_\{x\}\\\},wherebxb\_\{x\}is the observed label\. The queryqqis*determined byEE*if all hypotheses inHEH\_\{E\}have the same value atqq\.

For a fixed realized version spaceV⊆HV\\subseteq H, define the query\-disagreement pairs

Pq​\(V\):=\{\{h,h′\}⊆V:h​\(q\)≠h′​\(q\)\}\.P\_\{q\}\(V\):=\\\{\\\{h,h^\{\\prime\}\\\}\\subseteq V:h\(q\)\\neq h^\{\\prime\}\(q\)\\\}\.Forx∈Xx\\in X, let

Sepx⁡\(V\):=\{\{h,h′\}∈Pq​\(V\):h​\(x\)≠h′​\(x\)\}\.\\operatorname\{Sep\}\_\{x\}\(V\):=\\\{\\\{h,h^\{\\prime\}\\\}\\in P\_\{q\}\(V\):h\(x\)\\neq h^\{\\prime\}\(x\)\\\}\.Thusxxseparates precisely those pairs of surviving hypotheses that would answer the query differently\.

###### Theorem 4\.11\(Exact pair\-separator characterization\)\.

LetVVbe the version space after a fixed background context\. A setE⊆XE\\subseteq Xdeterminesqqrelative toVVif and only if

Pq​\(V\)⊆⋃x∈ESepx⁡\(V\)\.P\_\{q\}\(V\)\\subseteq\\bigcup\_\{x\\in E\}\\operatorname\{Sep\}\_\{x\}\(V\)\.Equivalently, finite\-context determination ofqqis exactly a hitting problem over query\-disagreement pairs\.

###### Proof\.

Assume first thatEEdeterminesqq\. Let\{h,h′\}∈Pq​\(V\)\\\{h,h^\{\\prime\}\\\}\\in P\_\{q\}\(V\)\. Sinceh​\(q\)≠h′​\(q\)h\(q\)\\neq h^\{\\prime\}\(q\), the two hypotheses cannot both remain compatible with the labels onEE\. Therefore there exists somex∈Ex\\in Esuch thath​\(x\)≠h′​\(x\)h\(x\)\\neq h^\{\\prime\}\(x\), so\{h,h′\}∈Sepx⁡\(V\)\\\{h,h^\{\\prime\}\\\}\\in\\operatorname\{Sep\}\_\{x\}\(V\)\.

Conversely, suppose every pair inPq​\(V\)P\_\{q\}\(V\)is separated by somex∈Ex\\in E\. Leth,h′∈Vh,h^\{\\prime\}\\in Vagree on all labels inEE\. Then\{h,h′\}\\\{h,h^\{\\prime\}\\\}cannot lie in anySepx⁡\(V\)\\operatorname\{Sep\}\_\{x\}\(V\)withx∈Ex\\in E\. By the assumed covering condition, it cannot belong toPq​\(V\)P\_\{q\}\(V\)\. Henceh​\(q\)=h′​\(q\)h\(q\)=h^\{\\prime\}\(q\)\. All hypotheses compatible withEEtherefore agree onqq, soEEdeterminesqq\. ∎

###### Definition 4\.12\(Query teaching dimension\)\.

The*query teaching dimension*ofqqrelative toVVis

TDq⁡\(V\):=min⁡\{\|E\|:E⊆X​determines​q​relative to​V\}\.\\operatorname\{TD\}\_\{q\}\(V\):=\\min\\\{\|E\|:E\\subseteq X\\text\{ determines \}q\\text\{ relative to \}V\\\}\.If no suchEEexists, setTDq⁡\(V\)=∞\\operatorname\{TD\}\_\{q\}\(V\)=\\infty\.

Theorem[4\.11](https://arxiv.org/html/2606.07623#S4.Thmtheorem11)says thatTDq⁡\(V\)\\operatorname\{TD\}\_\{q\}\(V\)is the minimum size of a set of examples whose separator sets coverPq​\(V\)P\_\{q\}\(V\)\. This turns semantic context selection into an exact finite optimization problem\.

###### Theorem 4\.13\(Minimal certificate extraction is NP\-complete\)\.

The decision problem

TDq⁡\(V\)≤k\\operatorname\{TD\}\_\{q\}\(V\)\\leq kis NP\-complete even when the output alphabet is binary\.

###### Proof\.

Membership in NP is immediate: a candidate setEEof at mostkkexamples can be checked by verifying the covering condition in Theorem[4\.11](https://arxiv.org/html/2606.07623#S4.Thmtheorem11)\.

For hardness, reduce Set Cover to the certificate problem\. Let a Set Cover instance have universeU=\{u1,…,um\}U=\\\{u\_\{1\},\\ldots,u\_\{m\}\\\}, subsetsS1,…,Sr⊆US\_\{1\},\\ldots,S\_\{r\}\\subseteq U, and budgetkk\. Create examplesx1,…,xrx\_\{1\},\\ldots,x\_\{r\}, one for each subset, and a queryqq\. For every elementu∈Uu\\in U, create two hypothesesaua\_\{u\}andbub\_\{u\}\. Define binary labels by

au​\(q\)=0,bu​\(q\)=1\.a\_\{u\}\(q\)=0,\\qquad b\_\{u\}\(q\)=1\.For examplexix\_\{i\}, set

au​\(xi\)=0,bu​\(xi\)=\{1,u∈Si,0,u∉Si\.a\_\{u\}\(x\_\{i\}\)=0,\\qquad b\_\{u\}\(x\_\{i\}\)=\\begin\{cases\}1,&u\\in S\_\{i\},\\\\ 0,&u\\notin S\_\{i\}\.\\end\{cases\}LetV:=\{au,bu:u∈U\}V:=\\\{a\_\{u\},b\_\{u\}:u\\in U\\\}\. The pair\{au,bu\}\\\{a\_\{u\},b\_\{u\}\\\}is separated byxix\_\{i\}exactly whenu∈Siu\\in S\_\{i\}\. If a collection of examplesEEdeterminesqq, then in particular it separates every pair\{au,bu\}\\\{a\_\{u\},b\_\{u\}\\\}, so the corresponding subsets cover every element ofUU\. Conversely, if subsets indexed byIIcoverUU, then the examples\{xi:i∈I\}\\\{x\_\{i\}:i\\in I\\\}separate every pair\{av,bu\}\\\{a\_\{v\},b\_\{u\}\\\}with different query value, becauseuuis covered by some selectedSiS\_\{i\}and thenbu​\(xi\)=1b\_\{u\}\(x\_\{i\}\)=1while everyav​\(xi\)=0a\_\{v\}\(x\_\{i\}\)=0\. Hence they determineqq\. Therefore a size\-kkcontext certificate exists exactly when the Set Cover instance has a size\-kkcover\. Since Set Cover is NP\-complete\[[1](https://arxiv.org/html/2606.07623#bib.bib1)\], the certificate decision problem is NP\-complete\. ∎

This theorem is the main finite\-context obstruction\. Even if the semantic presentation is exact and the hypothesis class is finite, extracting the smallest explanatory context is computationally hard\. Thus benchmark success does not automatically yield a short human\-checkable certificate; the certificate itself is a nontrivial combinatorial object\.

### 4\.3Conservative expansion and genuine information

Adding names to the language does not itself add information about old symbols\. The information comes from diagrams and constraints\.

###### Definition 4\.14\(Conservative context step\)\.

The stepTn⊆Tn\+1T\_\{n\}\\subseteq T\_\{n\+1\}is*conservative overℒn\\mathcal\{L\}\_\{n\}*if everyℒn\\mathcal\{L\}\_\{n\}\-sentenceφ\\varphisatisfying

Tn\+1⊧φT\_\{n\+1\}\\models\\varphialready satisfies

Tn⊧φ\.T\_\{n\}\\models\\varphi\.

###### Proposition 4\.15\(When examples change old consequences\)\.

The step fromTnT\_\{n\}toTn\+1T\_\{n\+1\}is nonconservative overℒn\\mathcal\{L\}\_\{n\}exactly when there exists an old\-language sentenceφ∈Sent⁡\(ℒn\)\\varphi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\_\{n\}\)such that

Tn∪\{¬φ\}​is satisfiablebutTn\+1⊧φ\.T\_\{n\}\\cup\\\{\\neg\\varphi\\\}\\text\{ is satisfiable\}\\qquad\\text\{but\}\\qquad T\_\{n\+1\}\\models\\varphi\.

###### Proof\.

This is the negation of conservativity written out\. If the extension is nonconservative, there is an old sentence entailed by the extension but not by the base\. Not being entailed by the base is equivalent to satisfiability ofTn∪\{¬φ\}T\_\{n\}\\cup\\\{\\neg\\varphi\\\}\. The converse is immediate\. ∎

Thus an example is semantically informative about prior vocabulary exactly when it eliminates at least one old\-language possibility\.

### 4\.4No purely infinite first\-order answer jumps

Compactness also gives a negative result: first\-order entailments cannot appear only at the infinite limit while being absent from every finite stage\.

###### Theorem 4\.16\(No infinite\-only first\-order certificate\)\.

Letψ\\psibe a sentence in the union language⋃nℒn\\bigcup\_\{n\}\\mathcal\{L\}\_\{n\}\. If

Tn⊧̸ψfor every​n,T\_\{n\}\\not\\models\\psi\\qquad\\text\{for every \}n,then

Tω⊧̸ψ\.T\_\{\\omega\}\\not\\models\\psi\.Equivalently, ifTω⊧ψT\_\{\\omega\}\\models\\psi, thenTN⊧ψT\_\{N\}\\models\\psifor some finiteNN\.

###### Proof\.

The second statement is the compactness argument used in Theorem[4\.7](https://arxiv.org/html/2606.07623#S4.Thmtheorem7)\. For the first statement, assumeTn⊧̸ψT\_\{n\}\\not\\models\\psifor everynn\. ThenTn∪\{¬ψ\}T\_\{n\}\\cup\\\{\\neg\\psi\\\}is satisfiable for everynnlarge enough to contain the language ofψ\\psi\. LetΣ\\Sigmabe any finite subset ofTωT\_\{\\omega\}\. Since the chain is increasing,Σ⊆TN\\Sigma\\subseteq T\_\{N\}for someNN\. ThenTN∪\{¬ψ\}T\_\{N\}\\cup\\\{\\neg\\psi\\\}is satisfiable, soΣ∪\{¬ψ\}\\Sigma\\cup\\\{\\neg\\psi\\\}is satisfiable\. Every finite subset ofTω∪\{¬ψ\}T\_\{\\omega\}\\cup\\\{\\neg\\psi\\\}is satisfiable\. By compactness,Tω∪\{¬ψ\}T\_\{\\omega\}\\cup\\\{\\neg\\psi\\\}is satisfiable, henceTω⊧̸ψT\_\{\\omega\}\\not\\models\\psi\. ∎

The theorem is a useful guardrail\. If a claimed in\-context phenomenon is expressible in first\-order form and is entailed by the entire context stream, then it has a finite logical certificate\. If no finite prefix can certify it, then the claimed property is either not first\-order in this presentation, not an entailment, or depends on an external limiting operation\.

### 4\.5Quantifier rank and finite evidence profiles

The previous compactness statements do not quantify the size of the finite certificate\. A finer analysis tracks formulas by syntactic complexity\.

###### Definition 4\.17\(Rank\-restricted consequence\)\.

LetSent≤r⁡\(ℒ\)\\operatorname\{Sent\}\_\{\\leq r\}\(\\mathcal\{L\}\)be the set of sentences of quantifier rank at mostrr\. Define

Tn≤r:=Tn∩Sent≤r⁡\(ℒn\)\.T\_\{n\}^\{\\leq r\}:=T\_\{n\}\\cap\\operatorname\{Sent\}\_\{\\leq r\}\(\\mathcal\{L\}\_\{n\}\)\.For a sentenceψ\\psi, write

Tn⊧rψT\_\{n\}\\models\_\{r\}\\psiif there is a finiteΣ⊆Tn≤r\\Sigma\\subseteq T\_\{n\}^\{\\leq r\}such thatΣ⊧ψ\\Sigma\\models\\psi\.

###### Proposition 4\.18\(Rank\-restricted monotonicity\)\.

IfTn⊧rψT\_\{n\}\\models\_\{r\}\\psi, thenTm⊧rψT\_\{m\}\\models\_\{r\}\\psifor everym≥nm\\geq n, after viewing formulas in the larger language\.

###### Proof\.

Choose finiteΣ⊆Tn≤r\\Sigma\\subseteq T\_\{n\}^\{\\leq r\}withΣ⊧ψ\\Sigma\\models\\psi\. SinceTn↑⊆TmT\_\{n\}^\{\\uparrow\}\\subseteq T\_\{m\}, the sameΣ\\Sigmais contained inTmT\_\{m\}\. Quantifier rank is unchanged by viewing a sentence in a larger language\. ∎

###### Definition 4\.19\(Evidence profile\)\.

For a propertyψ\\psi, define its evidence profile along the context chain by

ℰψ:=\{\(n,r\):Tn⊧rψ\}\.\\mathcal\{E\}\_\{\\psi\}:=\\\{\(n,r\):T\_\{n\}\\models\_\{r\}\\psi\\\}\.

###### Proposition 4\.20\(Upward closure of evidence profiles\)\.

If\(n,r\)∈ℰψ\(n,r\)\\in\\mathcal\{E\}\_\{\\psi\}, then\(m,s\)∈ℰψ\(m,s\)\\in\\mathcal\{E\}\_\{\\psi\}for everym≥nm\\geq nands≥rs\\geq r\.

###### Proof\.

The conditionm≥nm\\geq nis handled by Proposition[4\.18](https://arxiv.org/html/2606.07623#S4.Thmtheorem18)\. The conditions≥rs\\geq rholds becauseSent≤r⁡\(ℒm\)⊆Sent≤s⁡\(ℒm\)\\operatorname\{Sent\}\_\{\\leq r\}\(\\mathcal\{L\}\_\{m\}\)\\subseteq\\operatorname\{Sent\}\_\{\\leq s\}\(\\mathcal\{L\}\_\{m\}\)\. ∎

Thus a certified property has an upward\-closed region in the two\-dimensional plane of context length and logical complexity\. This is a precise way to ask whether a behavior needs more examples, more expressive formulas, or both\.

## 5Latent task families and exact identification

The preceding section is abstract\. This section records two concrete identification mechanisms\. The first is linear and algebraic\. The second is purely definability\-based\.

### 5\.1Linear task family over a finite field

Let𝔽\\mathbb\{F\}be a finite field andd≥1d\\geq 1\. Let𝐗=𝔽d\\mathbf\{X\}=\\mathbb\{F\}^\{d\}and𝐘=𝔽\\mathbf\{Y\}=\\mathbb\{F\}\. Suppose the background theory asserts the existence of a latent parameterw∈𝔽dw\\in\\mathbb\{F\}^\{d\}such that

f​\(x\)=w⊤​xfor all​x∈𝔽d\.f\(x\)=w^\{\\top\}x\\qquad\\text\{for all \}x\\in\\mathbb\{F\}^\{d\}\.Given examples\(ai,bi\)\(a\_\{i\},b\_\{i\}\)for1≤i≤n1\\leq i\\leq n, define

An=\(a1⊤⋮an⊤\)∈𝔽n×d,b\(n\)=\(b1⋮bn\)∈𝔽n,A\_\{n\}=\\begin\{pmatrix\}a\_\{1\}^\{\\top\}\\\\ \\vdots\\\\ a\_\{n\}^\{\\top\}\\end\{pmatrix\}\\in\\mathbb\{F\}^\{n\\times d\},\\qquad b^\{\(n\)\}=\\begin\{pmatrix\}b\_\{1\}\\\\ \\vdots\\\\ b\_\{n\}\\end\{pmatrix\}\\in\\mathbb\{F\}^\{n\},and

Wn:=\{w∈𝔽d:An​w=b\(n\)\}\.W\_\{n\}:=\\\{w\\in\\mathbb\{F\}^\{d\}:A\_\{n\}w=b^\{\(n\)\}\\\}\.
###### Theorem 5\.1\(Exact identifiability in the linear case\)\.

AssumeWn≠∅W\_\{n\}\\neq\\varnothing\. Then:

1. \(i\)for anyw0∈Wnw\_\{0\}\\in W\_\{n\}, Wn=w0\+Ker⁡\(An\),W\_\{n\}=w\_\{0\}\+\\operatorname\{Ker\}\(A\_\{n\}\),sodimWn=d−rank⁡\(An\)\\dim W\_\{n\}=d\-\\operatorname\{rank\}\(A\_\{n\}\);
2. \(ii\)for a queryq∈𝔽dq\\in\\mathbb\{F\}^\{d\}, allw∈Wnw\\in W\_\{n\}induce the same answerw⊤​qw^\{\\top\}qif and only if q∈Row⁡\(An\);q\\in\\operatorname\{Row\}\(A\_\{n\}\);
3. \(iii\)ifrank⁡\(An\)=d\\operatorname\{rank\}\(A\_\{n\}\)=d, thenWnW\_\{n\}is a singleton and every query is determined;
4. \(iv\)the number of remaining latent parameters is \|Wn\|=\|𝔽\|d−rank⁡\(An\)\.\|W\_\{n\}\|=\|\\mathbb\{F\}\|^\{d\-\\operatorname\{rank\}\(A\_\{n\}\)\}\.

###### Proof\.

Fixw0∈Wnw\_\{0\}\\in W\_\{n\}\. Forw∈Wnw\\in W\_\{n\},An​\(w−w0\)=0A\_\{n\}\(w\-w\_\{0\}\)=0, sow−w0∈Ker⁡\(An\)w\-w\_\{0\}\\in\\operatorname\{Ker\}\(A\_\{n\}\)\. Conversely, ifu∈Ker⁡\(An\)u\\in\\operatorname\{Ker\}\(A\_\{n\}\), thenAn​\(w0\+u\)=b\(n\)A\_\{n\}\(w\_\{0\}\+u\)=b^\{\(n\)\}, sow0\+u∈Wnw\_\{0\}\+u\\in W\_\{n\}\. This proves \(i\)\. The dimension statement follows from rank\-nullity\.

For \(ii\), assume firstq∈Row⁡\(An\)q\\in\\operatorname\{Row\}\(A\_\{n\}\)\. Thenq=An⊤​αq=A\_\{n\}^\{\\top\}\\alphafor someα∈𝔽n\\alpha\\in\\mathbb\{F\}^\{n\}\. Ifw1,w2∈Wnw\_\{1\},w\_\{2\}\\in W\_\{n\}, thenu=w1−w2∈Ker⁡\(An\)u=w\_\{1\}\-w\_\{2\}\\in\\operatorname\{Ker\}\(A\_\{n\}\), hence

\(w1−w2\)⊤​q=u⊤​An⊤​α=\(An​u\)⊤​α=0\.\(w\_\{1\}\-w\_\{2\}\)^\{\\top\}q=u^\{\\top\}A\_\{n\}^\{\\top\}\\alpha=\(A\_\{n\}u\)^\{\\top\}\\alpha=0\.Thus all consistent parameters agree onqq\.

Conversely, ifq∉Row⁡\(An\)q\\notin\\operatorname\{Row\}\(A\_\{n\}\), thenq∉Ker\(An\)⟂q\\notin\\operatorname\{Ker\}\(A\_\{n\}\)^\{\\perp\}\. Hence there existsu∈Ker⁡\(An\)u\\in\\operatorname\{Ker\}\(A\_\{n\}\)withu⊤​q≠0u^\{\\top\}q\\neq 0\. The two parametersw0w\_\{0\}andw0\+uw\_\{0\}\+uboth lie inWnW\_\{n\}, but they give different values onqq\. This proves \(ii\)\.

Item \(iii\) follows becauserank⁡\(An\)=d\\operatorname\{rank\}\(A\_\{n\}\)=dgivesKer⁡\(An\)=\{0\}\\operatorname\{Ker\}\(A\_\{n\}\)=\\\{0\\\}\. Item \(iv\) follows because an affine subspace over𝔽\\mathbb\{F\}of dimensiond−rank⁡\(An\)d\-\\operatorname\{rank\}\(A\_\{n\}\)has\|𝔽\|d−rank⁡\(An\)\|\\mathbb\{F\}\|^\{d\-\\operatorname\{rank\}\(A\_\{n\}\)\}elements\. ∎

This example distinguishes global identification from query\-local identification\. A context may fail to determine the whole latent parameter while still determining the queried answer\. The conditionq∈Row⁡\(An\)q\\in\\operatorname\{Row\}\(A\_\{n\}\)is the exact finite certificate\.

###### Corollary 5\.2\(Residual entropy in the uniform linear case\)\.

Assume the posterior onWnW\_\{n\}is uniform\. The residual entropy of the latent parameter is

H​\(w∣E≤n\)=\(d−rank⁡\(An\)\)​log⁡\|𝔽\|\.H\(w\\mid E\_\{\\leq n\}\)=\(d\-\\operatorname\{rank\}\(A\_\{n\}\)\)\\log\|\\mathbb\{F\}\|\.The residual entropy of the scalar answerw⊤​qw^\{\\top\}qis zero exactly whenq∈Row⁡\(An\)q\\in\\operatorname\{Row\}\(A\_\{n\}\)\.

###### Proof\.

The first formula follows from\|Wn\|=\|𝔽\|d−rank⁡\(An\)\|W\_\{n\}\|=\|\\mathbb\{F\}\|^\{d\-\\operatorname\{rank\}\(A\_\{n\}\)\}\. For the answer, Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1)says thatw⊤​qw^\{\\top\}qis constant onWnW\_\{n\}exactly whenq∈Row⁡\(An\)q\\in\\operatorname\{Row\}\(A\_\{n\}\)\. A constant random variable has entropy zero\. Ifq∉Row⁡\(An\)q\\notin\\operatorname\{Row\}\(A\_\{n\}\), the same theorem gives at least two possible answer values, so the entropy is positive under a full\-support uniform posterior onWnW\_\{n\}\. ∎

###### Theorem 5\.3\(Counting unresolved hypotheses and queries\)\.

Let\|𝔽\|=Q\|\\mathbb\{F\}\|=Qand assumeWn≠∅W\_\{n\}\\neq\\varnothing\. Ifr=rank⁡\(An\)r=\\operatorname\{rank\}\(A\_\{n\}\), then:

1. \(i\)the number of latent parameters consistent with the context is \|Wn\|=Qd−r;\|W\_\{n\}\|=Q^\{d\-r\};
2. \(ii\)the set of queries whose answers are already determined is exactlyRow⁡\(An\)\\operatorname\{Row\}\(A\_\{n\}\)and has sizeQrQ^\{r\};
3. \(iii\)the number of query vectors whose answers remain underdetermined is

###### Proof\.

By Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1),WnW\_\{n\}is an affine subspace parallel toKer⁡\(An\)\\operatorname\{Ker\}\(A\_\{n\}\), anddimKer⁡\(An\)=d−r\\dim\\operatorname\{Ker\}\(A\_\{n\}\)=d\-r\. Therefore\|Wn\|=Qd−r\|W\_\{n\}\|=Q^\{d\-r\}\. The same theorem states that a query is determined exactly when it lies inRow⁡\(An\)\\operatorname\{Row\}\(A\_\{n\}\)\. Since the row space has dimensionrr, it containsQrQ^\{r\}vectors\. The remainingQd−QrQ^\{d\}\-Q^\{r\}vectors are outside the row space and therefore underdetermined\. ∎

###### Theorem 5\.4\(Random\-context identification probability\)\.

Let the example inputsa1,…,ana\_\{1\},\\ldots,a\_\{n\}be independent uniform vectors in𝔽d\\mathbb\{F\}^\{d\}, with\|𝔽\|=Q\|\\mathbb\{F\}\|=Q, and let outputs be generated by a fixed latent parameterw∗w^\{\\ast\}throughbi=\(w∗\)⊤​aib\_\{i\}=\(w^\{\\ast\}\)^\{\\top\}a\_\{i\}\. Then full semantic identification occurs exactly whenAnA\_\{n\}has rankdd, and

ℙ​\(rank⁡\(An\)=d\)=\{0,n<d,∏i=0d−1\(1−Qi−n\),n≥d\.\\mathbb\{P\}\(\\operatorname\{rank\}\(A\_\{n\}\)=d\)=\\begin\{cases\}0,&n<d,\\\\\[5\.69054pt\] \\displaystyle\\prod\_\{i=0\}^\{d\-1\}\(1\-Q^\{i\-n\}\),&n\\geq d\.\\end\{cases\}Consequently, the probability that every query is semantically determined afternnrandom examples is given by the same expression\.

###### Proof\.

The consistency set is

Wn=\{w∈𝔽d:An​w=An​w∗\}=w∗\+Ker⁡\(An\)\.W\_\{n\}=\\\{w\\in\\mathbb\{F\}^\{d\}:A\_\{n\}w=A\_\{n\}w^\{\\ast\}\\\}=w^\{\\ast\}\+\\operatorname\{Ker\}\(A\_\{n\}\)\.ThusWnW\_\{n\}is a singleton exactly whenKer⁡\(An\)=\{0\}\\operatorname\{Ker\}\(A\_\{n\}\)=\\\{0\\\}, equivalentlyrank⁡\(An\)=d\\operatorname\{rank\}\(A\_\{n\}\)=d\. Ifn<dn<d, full rank is impossible\. Ifn≥dn\\geq d, the number of full\-column\-rankn×dn\\times dmatrices over𝔽\\mathbb\{F\}is

\(Qn−1\)​\(Qn−Q\)​⋯​\(Qn−Qd−1\)\.\(Q^\{n\}\-1\)\(Q^\{n\}\-Q\)\\cdots\(Q^\{n\}\-Q^\{d\-1\}\)\.Dividing by the total numberQn​dQ^\{nd\}ofn×dn\\times dmatrices gives

∏i=0d−1Qn−QiQn=∏i=0d−1\(1−Qi−n\)\.\\prod\_\{i=0\}^\{d\-1\}\\frac\{Q^\{n\}\-Q^\{i\}\}\{Q^\{n\}\}=\\prod\_\{i=0\}^\{d\-1\}\(1\-Q^\{i\-n\}\)\.The last statement follows from Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1): full rank is exactly the condition under which all queries are determined\. ∎

This gives a genuine certificate\-complexity statement rather than a philosophical analogy\. The context lengthnncontrols the rank distribution of the observed design matrix, and the rank distribution exactly controls semantic identifiability\.

For many evaluations, full recovery ofw∗w^\{\\ast\}is stronger than necessary\. A fixed query can become determined before the latent parameter is globally identified\. The next theorem gives the query\-local analogue of the full identification curve\.

###### Definition 5\.5\(Rank distribution\)\.

For0≤r≤min⁡\(n,d\)0\\leq r\\leq\\min\(n,d\), let

Rn,d,Q​\(r\):=1Qn​d​∏i=0r−1\(Qn−Qi\)​\(Qd−Qi\)∏i=0r−1\(Qr−Qi\)R\_\{n,d,Q\}\(r\):=\\frac\{1\}\{Q^\{nd\}\}\\frac\{\\prod\_\{i=0\}^\{r\-1\}\(Q^\{n\}\-Q^\{i\}\)\(Q^\{d\}\-Q^\{i\}\)\}\{\\prod\_\{i=0\}^\{r\-1\}\(Q^\{r\}\-Q^\{i\}\)\}with the empty product interpreted as11\. This is the probability that a uniformly randomn×dn\\times dmatrix over𝔽Q\\mathbb\{F\}\_\{Q\}has rankrr\.

###### Theorem 5\.6\(Query\-local identification probability\)\.

Letq∈𝔽Qdq\\in\\mathbb\{F\}\_\{Q\}^\{d\}be nonzero and letAnA\_\{n\}have independent uniform rows in𝔽Qd\\mathbb\{F\}\_\{Q\}^\{d\}\. Then

ℙ​\(q∈Row⁡\(An\)\)=∑r=0min⁡\(n,d\)Rn,d,Q​\(r\)​Qr−1Qd−1\.\\mathbb\{P\}\\bigl\(q\\in\\operatorname\{Row\}\(A\_\{n\}\)\\bigr\)=\\sum\_\{r=0\}^\{\\min\(n,d\)\}R\_\{n,d,Q\}\(r\)\\,\\frac\{Q^\{r\}\-1\}\{Q^\{d\}\-1\}\.Equivalently, the probability that the answer to the fixed queryqqis semantically determined afternnrandom examples is the expression above\. The zero query is determined with probability11for everynn\.

###### Proof\.

Condition on the eventrank⁡\(An\)=r\\operatorname\{rank\}\(A\_\{n\}\)=r\. By symmetry, conditional on this event the row space ofAnA\_\{n\}is uniformly distributed among allrr\-dimensional subspaces of𝔽Qd\\mathbb\{F\}\_\{Q\}^\{d\}\. The number of nonzero vectors in such a subspace isQr−1Q^\{r\}\-1, while the total number of nonzero vectors in𝔽Qd\\mathbb\{F\}\_\{Q\}^\{d\}isQd−1Q^\{d\}\-1\. Hence for fixed nonzeroqq,

ℙ​\(q∈Row⁡\(An\)∣rank⁡\(An\)=r\)=Qr−1Qd−1\.\\mathbb\{P\}\(q\\in\\operatorname\{Row\}\(A\_\{n\}\)\\mid\\operatorname\{rank\}\(A\_\{n\}\)=r\)=\\frac\{Q^\{r\}\-1\}\{Q^\{d\}\-1\}\.Averaging over the rank distribution gives the stated formula\. The equivalence with semantic determination follows from Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1)\(ii\)\. The zero query belongs to every row space, including the zero subspace\. ∎

###### Corollary 5\.7\(Expected fraction of determined queries\)\.

For randomAnA\_\{n\}over𝔽Q\\mathbb\{F\}\_\{Q\}, the expected fraction of all query vectors whose answers are determined is

𝔼​\[Qrank⁡\(An\)−d\]=∑r=0min⁡\(n,d\)Rn,d,Q​\(r\)​Qr−d\.\\mathbb\{E\}\\left\[Q^\{\\operatorname\{rank\}\(A\_\{n\}\)\-d\}\\right\]=\\sum\_\{r=0\}^\{\\min\(n,d\)\}R\_\{n,d,Q\}\(r\)Q^\{r\-d\}\.

###### Proof\.

By Theorem[5\.3](https://arxiv.org/html/2606.07623#S5.Thmtheorem3), ifrank⁡\(An\)=r\\operatorname\{rank\}\(A\_\{n\}\)=rthen exactlyQrQ^\{r\}of theQdQ^\{d\}query vectors are determined\. Divide byQdQ^\{d\}and average overrr\. ∎

### 5\.2Definable hypothesis quotients

The linear case is only a witness\. The model\-theoretic pattern is a quotient of hypotheses by the answers they induce\.

###### Definition 5\.8\(Hypothesis\-answer equivalence\)\.

Letℋ\\mathcal\{H\}be a definable family of structures or latent parameters, and letQQbe a set of queries\. Forh,h′∈ℋh,h^\{\\prime\}\\in\\mathcal\{H\}define

h≡Qh′⟺\(∀q∈Q\)​fh​\(q\)=fh′​\(q\)\.h\\equiv\_\{Q\}h^\{\\prime\}\\quad\\Longleftrightarrow\\quad\(\\forall q\\in Q\)\\ f\_\{h\}\(q\)=f\_\{h^\{\\prime\}\}\(q\)\.For a finite example setE≤nE\_\{\\leq n\}, define the version space

ℋn:=\{h∈ℋ:h​is consistent with​E≤n\}\.\\mathcal\{H\}\_\{n\}:=\\\{h\\in\\mathcal\{H\}:h\\text\{ is consistent with \}E\_\{\\leq n\}\\\}\.

###### Proposition 5\.9\(Query determination by quotient collapse\)\.

For a single queryqq, the contextE≤nE\_\{\\leq n\}determines the answer toqqif and only ifℋn\\mathcal\{H\}\_\{n\}is contained in one≡\{q\}\\equiv\_\{\\\{q\\\}\}\-equivalence class\.

###### Proof\.

Ifℋn\\mathcal\{H\}\_\{n\}is contained in one equivalence class, all hypotheses inℋn\\mathcal\{H\}\_\{n\}give the same value onqq, so the answer is determined\. Conversely, if the answer is determined, then any two hypotheses inℋn\\mathcal\{H\}\_\{n\}agree onqq, hence belong to the same≡\{q\}\\equiv\_\{\\\{q\\\}\}\-class\. ∎

###### Theorem 5\.10\(Finite quotient certificate\)\.

Supposeℋω:=⋂nℋn\\mathcal\{H\}\_\{\\omega\}:=\\bigcap\_\{n\}\\mathcal\{H\}\_\{n\}is contained in one≡\{q\}\\equiv\_\{\\\{q\\\}\}\-class and the consistency ofh∈ℋnh\\in\\mathcal\{H\}\_\{n\}is first\-order expressible over the expansion chain\. Then there existsNNsuch thatℋN\\mathcal\{H\}\_\{N\}is contained in one≡\{q\}\\equiv\_\{\\\{q\\\}\}\-class\.

###### Proof\.

The assumption says that the direct\-limit theory entails equality of the answer onqqacross all remaining hypotheses\. Written as a first\-order sentence in the expanded language, this is an entailment ofTωT\_\{\\omega\}\. By compactness, a finite subset ofTωT\_\{\\omega\}already entails it\. Since the theory chain is increasing, that finite subset is contained in someTNT\_\{N\}\. ThereforeℋN\\mathcal\{H\}\_\{N\}has already collapsed to one answer class forqq\. ∎

This theorem expresses the same idea as the linear row\-space criterion, but without assuming linearity\. Examples eliminate answer\-equivalence classes\. In\-context learning is exact when the remaining classes collapse for the query under consideration\.

## 6Emergence as asymptotic latent entailment

Scale introduces a directed family of semantic presentations\. We suppress the context parameter in this section when it is fixed\. For each sentenceφ\\varphi, write

sλ​\(φ\):=sλ,cλ​\(φ\),s\_\{\\lambda\}\(\\varphi\):=s\_\{\\lambda,c\_\{\\lambda\}\}\(\\varphi\),wherecλc\_\{\\lambda\}may itself depend on scale\.

###### Definition 6\.1\(Decisive scale family and limit theory\)\.

Letℱ⊆Sent⁡\(ℒ\)\\mathcal\{F\}\\subseteq\\operatorname\{Sent\}\(\\mathcal\{L\}\)be closed under negation and finite conjunction\. A scale family is*decisive onℱ\\mathcal\{F\}*if for everyφ∈ℱ\\varphi\\in\\mathcal\{F\}the limit

limλsλ​\(φ\)\\lim\_\{\\lambda\}s\_\{\\lambda\}\(\\varphi\)exists and belongs to\{0,1\}\\\{0,1\\\}\. The almost\-sure limit theory onℱ\\mathcal\{F\}is

T∞ℱ:=\{φ∈ℱ:limλsλ​\(φ\)=1\}\.T\_\{\\infty\}^\{\\mathcal\{F\}\}:=\\\{\\varphi\\in\\mathcal\{F\}:\\lim\_\{\\lambda\}s\_\{\\lambda\}\(\\varphi\)=1\\\}\.Whenℱ=Sent⁡\(ℒ\)\\mathcal\{F\}=\\operatorname\{Sent\}\(\\mathcal\{L\}\), writeT∞T\_\{\\infty\}\.

###### Theorem 6\.2\(Limit theory is complete and consistent\)\.

If the family is decisive onℱ\\mathcal\{F\}, thenT∞ℱT\_\{\\infty\}^\{\\mathcal\{F\}\}is complete onℱ\\mathcal\{F\}and every finite subset of it is satisfiable at all sufficiently large scales\.

###### Proof\.

For completeness, fixφ∈ℱ\\varphi\\in\\mathcal\{F\}\. Since

sλ​\(¬φ\)=1−sλ​\(φ\),s\_\{\\lambda\}\(\\neg\\varphi\)=1\-s\_\{\\lambda\}\(\\varphi\),exactly one ofφ\\varphiand¬φ\\neg\\varphihas limiting confidence11\.

For finite satisfiability, letΔ=\{δ1,…,δm\}⊆T∞ℱ\\Delta=\\\{\\delta\_\{1\},\\ldots,\\delta\_\{m\}\\\}\\subseteq T\_\{\\infty\}^\{\\mathcal\{F\}\}and putδ=⋀iδi\\delta=\\bigwedge\_\{i\}\\delta\_\{i\}\. Sinceℱ\\mathcal\{F\}is closed under finite conjunction,δ∈ℱ\\delta\\in\\mathcal\{F\}\. By subadditivity,

1−sλ​\(δ\)≤∑i=1m\(1−sλ​\(δi\)\)\.1\-s\_\{\\lambda\}\(\\delta\)\\leq\\sum\_\{i=1\}^\{m\}\(1\-s\_\{\\lambda\}\(\\delta\_\{i\}\)\)\.Each summand tends to0, sosλ​\(δ\)→1s\_\{\\lambda\}\(\\delta\)\\to 1\. Hencesλ​\(δ\)\>0s\_\{\\lambda\}\(\\delta\)\>0at all sufficiently large scales\. Therefore some structure at those scales satisfiesΔ\\Delta\. ∎

###### Definition 6\.3\(Threshold theory and manifestation\)\.

Forτ∈\(0,1\)\\tau\\in\(0,1\)define

Tλ,τ:=\{φ∈Sent⁡\(ℒ\):sλ​\(φ\)\>τ\}\.T\_\{\\lambda,\\tau\}:=\\\{\\varphi\\in\\operatorname\{Sent\}\(\\mathcal\{L\}\):s\_\{\\lambda\}\(\\varphi\)\>\\tau\\\}\.A sentenceφ\\varphiis a*latent entailment*whenφ∈T∞\\varphi\\in T\_\{\\infty\}\. It is*τ\\tau\-manifest from scaleλ0\\lambda\_\{0\}*when

φ∈Tλ,τfor all​λ≥λ0\.\\varphi\\in T\_\{\\lambda,\\tau\}\\qquad\\text\{for all \}\\lambda\\geq\\lambda\_\{0\}\.It is*τ\\tau\-emergent*when it is latent, hidden below threshold at some smaller scale, andτ\\tau\-manifest from some later scale onward\.

###### Theorem 6\.4\(Threshold manifestation of latent entailments\)\.

Ifφ∈T∞\\varphi\\in T\_\{\\infty\}andτ<1\\tau<1, thenφ\\varphiisτ\\tau\-manifest from some scale onward\. If it is below threshold at some earlier scale, then it isτ\\tau\-emergent\.

###### Proof\.

Sinceφ∈T∞\\varphi\\in T\_\{\\infty\},sλ​\(φ\)→1s\_\{\\lambda\}\(\\varphi\)\\to 1\. Forτ<1\\tau<1, convergence gives a scaleλ0\\lambda\_\{0\}such thatsλ​\(φ\)\>τs\_\{\\lambda\}\(\\varphi\)\>\\taufor allλ≥λ0\\lambda\\geq\\lambda\_\{0\}\. The second statement is the definition ofτ\\tau\-emergence\. ∎

This definition makes emergence a relation between a limit theory and an observation threshold\. It is not an unexplained creation of a semantic fact at large scale\.

###### Theorem 6\.5\(Rate\-sensitive threshold bound\)\.

Assumeφ∈T∞\\varphi\\in T\_\{\\infty\}and suppose there are constantsa\>0a\>0,α\>0\\alpha\>0, andλ1\\lambda\_\{1\}such that

1−sλ​\(φ\)≤a​λ−αfor all​λ≥λ1\.1\-s\_\{\\lambda\}\(\\varphi\)\\leq a\\lambda^\{\-\\alpha\}\\qquad\\text\{for all \}\\lambda\\geq\\lambda\_\{1\}\.Then for every thresholdτ<1\\tau<1,

φ∈Tλ,τ\\varphi\\in T\_\{\\lambda,\\tau\}for all

λ≥max⁡\{λ1,\(a1−τ\)1/α\}\\lambda\\geq\\max\\left\\\{\\lambda\_\{1\},\\left\(\\frac\{a\}\{1\-\\tau\}\\right\)^\{1/\\alpha\}\\right\\\}after an arbitrarily small enlargement of the right\-hand side if the threshold is strict\.

###### Proof\.

Ifλ\\lambdasatisfies the displayed lower bound before the final strict\-threshold adjustment, then

a​λ−α≤1−τ\.a\\lambda^\{\-\\alpha\}\\leq 1\-\\tau\.Therefore

1−sλ​\(φ\)≤1−τ,1\-s\_\{\\lambda\}\(\\varphi\)\\leq 1\-\\tau,sosλ​\(φ\)≥τs\_\{\\lambda\}\(\\varphi\)\\geq\\tau\. For a strict threshold, replace the displayed lower bound by\(a/\(1−τ−ϵ\)\)1/α\(a/\(1\-\\tau\-\\epsilon\)\)^\{1/\\alpha\}for anyϵ∈\(0,1−τ\)\\epsilon\\in\(0,1\-\\tau\)\. Thensλ​\(φ\)\>τs\_\{\\lambda\}\(\\varphi\)\>\\tau\. ∎

This theorem converts a qualitative limit\-theory statement into a crossing\-scale estimate\. It is the form needed for theoretical comparison with empirical scaling curves: a rate of semantic convergence implies a bound on the scale at which the property becomes visible to a chosen threshold\.

### 6\.1Ultraproduct representation of scale limits

The asymptotic commitments can be represented by a single structure through an ultraproduct\. The following construction avoids assuming that one fixed structure satisfies all limit sentences at each finite scale\.

###### Theorem 6\.6\(Ultraproduct witness for the limit theory\)\.

LetT∞T\_\{\\infty\}be countable\. Suppose for every finiteΔ⊆T∞\\Delta\\subseteq T\_\{\\infty\}there existsNΔN\_\{\\Delta\}such that for alln≥NΔn\\geq N\_\{\\Delta\}some𝔐n∈𝒦n\\mathfrak\{M\}\_\{n\}\\in\\mathcal\{K\}\_\{n\}satisfiesΔ\\Delta\. Then there are structures𝔐n∈𝒦n\\mathfrak\{M\}\_\{n\}\\in\\mathcal\{K\}\_\{n\}and a nonprincipal ultrafilter𝒰\\mathcal\{U\}onℕ\\mathbb\{N\}such that

∏n→𝒰𝔐n⊧T∞\.\\prod\_\{n\\to\\mathcal\{U\}\}\\mathfrak\{M\}\_\{n\}\\models T\_\{\\infty\}\.

###### Proof\.

EnumerateT∞=\{φ1,φ2,…\}T\_\{\\infty\}=\\\{\\varphi\_\{1\},\\varphi\_\{2\},\\ldots\\\}and let

Δk:=\{φ1,…,φk\}\.\\Delta\_\{k\}:=\\\{\\varphi\_\{1\},\\ldots,\\varphi\_\{k\}\\\}\.For eachnn, choosek​\(n\)k\(n\)maximal such thatNΔk​\(n\)≤nN\_\{\\Delta\_\{k\(n\)\}\}\\leq n, takingk​\(n\)=0k\(n\)=0if no suchkkexists\. Choose𝔐n∈𝒦n\\mathfrak\{M\}\_\{n\}\\in\\mathcal\{K\}\_\{n\}satisfyingΔk​\(n\)\\Delta\_\{k\(n\)\}wheneverk​\(n\)\>0k\(n\)\>0, and choose any element of𝒦n\\mathcal\{K\}\_\{n\}otherwise\. For each fixedjj, all sufficiently largennsatisfyk​\(n\)≥jk\(n\)\\geq j, so

\{n:𝔐n⊧φj\}\\\{n:\\mathfrak\{M\}\_\{n\}\\models\\varphi\_\{j\}\\\}contains a cofinite set\. Every nonprincipal ultrafilter contains all cofinite sets\. By Łoś’s theorem, the ultraproduct satisfies eachφj\\varphi\_\{j\}\. ∎

The ultraproduct is a limit witness\. It packages the eventual finite satisfiability of semantic commitments into one mathematical object\.

### 6\.2Metric thresholds and apparent jumps

A benchmark score is an observation of semantic confidence through a metric\. If the metric is discontinuous, it can create threshold jumps without any discontinuity in the score being measured\.

###### Definition 6\.7\(Observation functional\)\.

An observation functional is a map

Ω:\[0,1\]→ℝ\.\\Omega:\[0,1\]\\to\\mathbb\{R\}\.It is*thresholded*atτ\\tauif

Ωτ​\(x\)=\{0,x<τ,1,x≥τ\.\\Omega\_\{\\tau\}\(x\)=\\begin\{cases\}0,&x<\\tau,\\\\ 1,&x\\geq\\tau\.\\end\{cases\}

###### Theorem 6\.8\(Threshold jumps do not imply semantic discontinuity\)\.

Lets:ℕ→\[0,1\]s:\\mathbb\{N\}\\to\[0,1\]be nondecreasing ands​\(n\)→1s\(n\)\\to 1\. Fixτ∈\(0,1\)\\tau\\in\(0,1\)and definemτ​\(n\)=Ωτ​\(s​\(n\)\)m\_\{\\tau\}\(n\)=\\Omega\_\{\\tau\}\(s\(n\)\)\. If there existsNNsuch that

s​\(N−1\)<τ≤s​\(N\),s\(N\-1\)<\\tau\\leq s\(N\),thenmτm\_\{\\tau\}jumps atNN\. This can occur even when every increments​\(n\)−s​\(n−1\)s\(n\)\-s\(n\-1\)is at most any prescribedϵ\>0\\epsilon\>0\.

###### Proof\.

The displayed inequality givesmτ​\(N−1\)=0m\_\{\\tau\}\(N\-1\)=0andmτ​\(N\)=1m\_\{\\tau\}\(N\)=1\. For the final claim, fixϵ\>0\\epsilon\>0and choose an increasing sequence from0to11whose mesh is at mostϵ\\epsilon, for instance a sufficiently fine discretization of1−e−t1\-e^\{\-t\}\. It crossesτ\\tauat some index, so the thresholded metric jumps, while the underlying increments are bounded byϵ\\epsilon\. ∎

###### Proposition 6\.9\(Continuous observation preserves graduality\)\.

LetΩ:\[0,1\]→ℝ\\Omega:\[0,1\]\\to\\mathbb\{R\}be uniformly continuous\. For everyη\>0\\eta\>0there existsϵ\>0\\epsilon\>0such that if

\|s​\(n\)−s​\(n−1\)\|<ϵ,\|s\(n\)\-s\(n\-1\)\|<\\epsilon,then

\|Ω​\(s​\(n\)\)−Ω​\(s​\(n−1\)\)\|<η\.\|\\Omega\(s\(n\)\)\-\\Omega\(s\(n\-1\)\)\|<\\eta\.

###### Proof\.

This is precisely uniform continuity applied to the two pointss​\(n\)s\(n\)ands​\(n−1\)s\(n\-1\)\. ∎

The mathematical distinction is now explicit\. A jump in a thresholded metric is evidence about the metric and the crossing point\. It is not, by itself, evidence that semantic confidence changed discontinuously\.

A common way for a benchmark to manufacture a threshold is to score an answer as correct only when several fields are simultaneously correct\. Exact match over a multi\-field certificate is exactly such a conjunction, and it sharpens the apparent jump in a way the next proposition makes precise; it is the mechanism observed in Section[9](https://arxiv.org/html/2606.07623#S9)\.

###### Proposition 6\.10\(Conjunctive thresholding of exact match\)\.

Let an item requirekkanswer fields, and suppose a generator returns each field correctly and independently with probabilitys∈\[0,1\]s\\in\[0,1\]\. Score the item by its graded value, the expected fraction of correct fields, and by exact match, the event that allkkfields are correct\. Then:

1. \(i\)the graded value isssand the exact\-match probability issks^\{k\}, sosk≤ss^\{k\}\\leq s, with equality iffk=1k=1ors∈\{0,1\}s\\in\\\{0,1\\\};
2. \(ii\)the rise of exact match is compressed towards=1s=1: for everyη∈\(0,1\)\\eta\\in\(0,1\),sk<ηs^\{k\}<\\etawhenevers<η1/ks<\\eta^\{1/k\}, and the width1−η1/k1\-\\eta^\{1/k\}of that interval tends to0ask→∞k\\to\\infty;
3. \(iii\)the exact match of one item is the conjunction of itskkfield\-correctness events, hence a thresholded observation in the sense of Definition[6\.7](https://arxiv.org/html/2606.07623#S6.Thmtheorem7); by Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8)it can flip from0to11while the graded field\-confidence rises smoothly, and aggregated over items the exact accuracy follows the compressed envelopesks^\{k\}of \(i\)–\(ii\)\. For a single field\(k=1\)\(k=1\)the observation is the identity and Proposition[6\.9](https://arxiv.org/html/2606.07623#S6.Thmtheorem9)applies, so no compression occurs;
4. \(iv\)if moreover1−sλ≤a​λ−α1\-s\_\{\\lambda\}\\leq a\\lambda^\{\-\\alpha\}for a scale parameterλ\\lambda\(the rate hypothesis of Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5)\), then exact match on akk\-field item clears a thresholdτ\\tauonly once λ≥\(k​a1−τ\)1/α,\\lambda\\geq\\Bigl\(\\tfrac\{ka\}\{1\-\\tau\}\\Bigr\)^\{1/\\alpha\},a factork1/αk^\{1/\\alpha\}later than the single\-field crossing scale of Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5)\.

###### Proof\.

\(i\) By independence the all\-correct probability is∏j=1ks=sk\\prod\_\{j=1\}^\{k\}s=s^\{k\}, while the expected fraction of correct fields is1k​∑j=1ks=s\\frac\{1\}\{k\}\\sum\_\{j=1\}^\{k\}s=s; the inequality and its equality cases are elementary\. \(ii\)sk<η⇔s<η1/ks^\{k\}<\\eta\\iff s<\\eta^\{1/k\}, andη1/k→1\\eta^\{1/k\}\\to 1ask→∞k\\to\\infty\. \(iii\) The all\-correct indicator is the meet of thekkfield indicators, a thresholded observation; Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8)gives the flip from a smooth underlying confidence, and averaging the indicator over a population with per\-field reliabilityssreturnssks^\{k\}\. The casek=1k=1is the identity observation, covered by Proposition[6\.9](https://arxiv.org/html/2606.07623#S6.Thmtheorem9)\. \(iv\) By Bernoulli’s inequality\(1−a​λ−α\)k≥1−k​a​λ−α\(1\-a\\lambda^\{\-\\alpha\}\)^\{k\}\\geq 1\-ka\\lambda^\{\-\\alpha\}, sosλk≥τs\_\{\\lambda\}^\{k\}\\geq\\tauas soon ask​a​λ−α≤1−τka\\lambda^\{\-\\alpha\}\\leq 1\-\\tau, that isλ≥\(k​a/\(1−τ\)\)1/α\\lambda\\geq\(ka/\(1\-\\tau\)\)^\{1/\\alpha\}\. ∎

This is a purely metric effect: the underlying per\-field confidence may rise by a fixed small amount while the exact\-match score on akk\-field item rises arbitrarily more steeply, and the scale at which it becomes visible is inflated byk1/αk^\{1/\\alpha\}\. Section[9](https://arxiv.org/html/2606.07623#S9)exhibits both halves on trained models, with the multi\-field certificate families jumping and the single\-field family staying gradual\.

### 6\.3Stability of answer sets under posterior perturbation

The set\-theoretic account above can be paired with probabilistic robustness\. If two posterior semantic measures are close on definable events, then their confidence assignments are close on all formulas in the tested fragment\.

###### Definition 6\.11\(Fragment total variation\)\.

For two probability measuresμ\\muandν\\nuon𝒦\\mathcal\{K\}, and a fragmentℱ⊆Sent⁡\(ℒ\)\\mathcal\{F\}\\subseteq\\operatorname\{Sent\}\(\\mathcal\{L\}\), define

TVℱ⁡\(μ,ν\):=supφ∈ℱ\|μ​\(\[\[φ\]\]𝒦\)−ν​\(\[\[φ\]\]𝒦\)\|\.\\operatorname\{TV\}\_\{\\mathcal\{F\}\}\(\\mu,\\nu\):=\\sup\_\{\\varphi\\in\\mathcal\{F\}\}\\left\|\\mu\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)\-\\nu\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)\\right\|\.

###### Proposition 6\.12\(Confidence robustness\)\.

IfTVℱ⁡\(μ,ν\)≤ϵ\\operatorname\{TV\}\_\{\\mathcal\{F\}\}\(\\mu,\\nu\)\\leq\\epsilon, then for everyφ∈ℱ\\varphi\\in\\mathcal\{F\},

\|sμ​\(φ\)−sν​\(φ\)\|≤ϵ,\|s\_\{\\mu\}\(\\varphi\)\-s\_\{\\nu\}\(\\varphi\)\|\\leq\\epsilon,wheresμ​\(φ\)=μ​\(\[\[φ\]\]𝒦\)s\_\{\\mu\}\(\\varphi\)=\\mu\(\[\\\!\[\\varphi\]\\\!\]\_\{\\mathcal\{K\}\}\)and similarly forν\\nu\.

###### Proof\.

This is immediate from the definition of the supremum\. ∎

###### Corollary 6\.13\(Threshold stability margin\)\.

Letφ∈ℱ\\varphi\\in\\mathcal\{F\}and supposesμ​\(φ\)\>τ\+ϵs\_\{\\mu\}\(\\varphi\)\>\\tau\+\\epsilon\. IfTVℱ⁡\(μ,ν\)≤ϵ\\operatorname\{TV\}\_\{\\mathcal\{F\}\}\(\\mu,\\nu\)\\leq\\epsilon, thensν​\(φ\)\>τs\_\{\\nu\}\(\\varphi\)\>\\tau\.

###### Proof\.

By Proposition[6\.12](https://arxiv.org/html/2606.07623#S6.Thmtheorem12),sν​\(φ\)≥sμ​\(φ\)−ϵ\>τs\_\{\\nu\}\(\\varphi\)\\geq s\_\{\\mu\}\(\\varphi\)\-\\epsilon\>\\tau\. ∎

This margin statement is the semantic analogue of robustness: once a property is sufficiently above threshold, small posterior perturbations cannot remove its threshold visibility\.

## 7Verification, predictability, and finite certificates

The semantic calculus yields finite certificates in two directions: finite context certificates and finite scale thresholds\. The first comes from compactness\. The second comes from convergence in the limit theory\.

###### Theorem 7\.1\(Finite property certificates\)\.

Letqqbe a query term and letχ​\(y\)\\chi\(y\)be a formula with one free variable of sort𝐘\\mathbf\{Y\}\. If

Tω⊧χ​\(f​\(q\)\),T\_\{\\omega\}\\models\\chi\(f\(q\)\),then there existsN∈ℕN\\in\\mathbb\{N\}such that

TN⊧χ​\(f​\(q\)\)\.T\_\{N\}\\models\\chi\(f\(q\)\)\.

###### Proof\.

The entailment is equivalent to inconsistency of

Tω∪\{¬χ​\(f​\(q\)\)\}\.T\_\{\\omega\}\\cup\\\{\\neg\\chi\(f\(q\)\)\\\}\.By compactness, a finite subsetΣ⊆Tω\\Sigma\\subseteq T\_\{\\omega\}already gives the inconsistency\. Since the chain is increasing,Σ⊆TN\\Sigma\\subseteq T\_\{N\}for someNN\. ThereforeTN⊧χ​\(f​\(q\)\)T\_\{N\}\\models\\chi\(f\(q\)\)\. ∎

###### Definition 7\.2\(Prompt\-stable property\)\.

Let𝒫\\mathcal\{P\}be a class of prompt specifications onXX\. A sentenceψ\\psiis*stable over𝒫\\mathcal\{P\}*if

\(∀p∈𝒫\)​\(∀𝔐∈𝖴𝗉𝖽p​\(X\)\)​𝔐⊧ψ\.\(\\forall p\\in\\mathcal\{P\}\)\(\\forall\\mathfrak\{M\}\\in\\mathsf\{Upd\}\_\{p\}\(X\)\)\\ \\mathfrak\{M\}\\models\\psi\.It is*locally stable atpp*if there exists a neighborhood𝒩​\(p\)⊆𝒫\\mathcal\{N\}\(p\)\\subseteq\\mathcal\{P\}such thatψ\\psiis stable over𝒩​\(p\)\\mathcal\{N\}\(p\)\.

###### Proposition 7\.3\(Stability by invariant support\)\.

If there is a setY⊆XY\\subseteq Xsuch that

𝖴𝗉𝖽p​\(X\)⊆Yfor every​p∈𝒫,\\mathsf\{Upd\}\_\{p\}\(X\)\\subseteq Y\\qquad\\text\{for every \}p\\in\\mathcal\{P\},and every𝔐∈Y\\mathfrak\{M\}\\in Ysatisfiesψ\\psi, thenψ\\psiis stable over𝒫\\mathcal\{P\}\.

###### Proof\.

For eachp∈𝒫p\\in\\mathcal\{P\}, all selected structures lie inYY, and every structure inYYsatisfiesψ\\psi\. Hence all selected structures satisfyψ\\psi\. ∎

###### Corollary 7\.4\(Asymptotic predictability of certified properties\)\.

Suppose a scale family is decisive andψ∈T∞\\psi\\in T\_\{\\infty\}\. Then for everyτ<1\\tau<1there existsλ0\\lambda\_\{0\}such that

ψ∈Tλ,τfor all​λ≥λ0\.\\psi\\in T\_\{\\lambda,\\tau\}\\qquad\\text\{for all \}\\lambda\\geq\\lambda\_\{0\}\.Ifψ\\psiis also entailed by an in\-context limit theoryTωT\_\{\\omega\}, then there exists a finite context stageNNsuch thatTN⊧ψT\_\{N\}\\models\\psi\.

###### Proof\.

The first statement is Theorem[6\.4](https://arxiv.org/html/2606.07623#S6.Thmtheorem4)\. The second statement is Theorem[7\.1](https://arxiv.org/html/2606.07623#S7.Thmtheorem1)applied to the relevant formula\. ∎

This separates two verification questions\. A finite context stage can certify a property at the logical level\. A sufficiently large scale can make that property visible above a confidence threshold\. Neither statement requires treating the model as transparent at the level of weights\.

### 7\.1Certificate\-checked refinement

The certificates above are finite, and for the three families they are also*checkable without the answer they certify*\. This makes a closed\-loop use of them possible, in which checking, rather than a reference key, drives improvement\.

###### Definition 7\.5\(Sound oracle\-free checker\)\.

For a family of instancesxxwith an admissible\-answer relation, a*checker*is a mapVVsending a pair\(x,c\)\(x,c\)of an instance and a candidate certificate to\{𝖺𝖼𝖼𝖾𝗉𝗍,𝗋𝖾𝗃𝖾𝖼𝗍\}\\\{\\mathsf\{accept\},\\mathsf\{reject\}\\\}that is computable from\(x,c\)\(x,c\)alone\. It is*sound*ifV​\(x,c\)=𝖺𝖼𝖼𝖾𝗉𝗍V\(x,c\)=\\mathsf\{accept\}implies that the answer encoded byccis correct, and*oracle\-free*in that it never reads a reference answer\.

###### Lemma 7\.6\(The family criteria are sound checkers\)\.

Each family admits a sound oracle\-free checker\.

1. \(i\)Linear family: accept a forced certificate\(c,v\)\(c,v\)when∑ici​ai=q\\sum\_\{i\}c\_\{i\}a\_\{i\}=qandv=∑ici​biv=\\sum\_\{i\}c\_\{i\}b\_\{i\}over𝔽\\mathbb\{F\}, and an underdetermination certificate\(w1,w2\)\(w\_\{1\},w\_\{2\}\)when both satisfyw⊤​ai=biw^\{\\top\}a\_\{i\}=b\_\{i\}for alliiandw1⊤​q≠w2⊤​qw\_\{1\}^\{\\top\}q\\neq w\_\{2\}^\{\\top\}q\.
2. \(ii\)Threshold family: accept\(λ,δ\)\(\\lambda,\\delta\)whenλ≥2\\lambda\\geq 2is an integer,sλ≥τ\>sλ−1s\_\{\\lambda\}\\geq\\tau\>s\_\{\\lambda\-1\}, andδ=sλ−sλ−1\\delta=s\_\{\\lambda\}\-s\_\{\\lambda\-1\}\.
3. \(iii\)Preferential family: accept\(Sp,Sp⊕q,b\)\(S\_\{p\},S\_\{p\\oplus q\},b\)whenSpS\_\{p\}andSp⊕qS\_\{p\\oplus q\}are the rank\-minimal selected sets and the bitbbequals\[Sp⊕q⊆Sp\]\[\\,S\_\{p\\oplus q\}\\subseteq S\_\{p\}\\,\]\.

###### Proof\.

Each acceptance condition restates the corresponding criterion\. \(i\) is Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1)\(ii\): a row\-space combination withc⊤​A=q⊤c^\{\\top\}A=q^\{\\top\}forcesw⊤​q=c⊤​bw^\{\\top\}q=c^\{\\top\}bfor every consistentww, while two consistent witnesses separatingqqexhibit underdetermination\. \(ii\) is the crossing\-scale condition behind Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5):sλ≥τ\>sλ−1s\_\{\\lambda\}\\geq\\tau\>s\_\{\\lambda\-1\}identifiesλ\\lambdaas the least crossing scale and fixes the local increment\. \(iii\) is the finite case of Theorem[3\.9](https://arxiv.org/html/2606.07623#S3.Thmtheorem9), whose preservation bit is exactly the containment of selected sets\. Every condition is a function ofxxandcc, soVVis oracle\-free\. ∎

###### Definition 7\.7\(Aversive refinement schedule\)\.

Fix a sound checkerVVand a budgetKK\. A refinement maintains an*accepted set*AtA\_\{t\}of items, each carrying a verified certificate\. The setA0A\_\{0\}is whatever a first pass certifies\. At roundt≥1t\\geq 1a generator proposes certificates for the items outsideAt−1A\_\{t\-1\}, conditioned on a memoryMtM\_\{t\}of its own previously rejected certificates together with the checker’s rejection reasons for them;VVis applied, and items it accepts enterAtA\_\{t\}and are never revised\. Withptp\_\{t\}the number of still\-unverified items, the schedule greedily increases

J=∑t\(\|At\|−κ​pt\+ν​zt\),zt=\|At−1\|,J=\\sum\_\{t\}\\bigl\(\|A\_\{t\}\|\-\\kappa\\,p\_\{t\}\+\\nu\\,z\_\{t\}\\bigr\),\\qquad z\_\{t\}=\|A\_\{t\-1\}\|,in which a rejection is the aversive termppand the retained certificates are the continuity termzz; accepted certificates are never revised, so there is no disruption term\.

###### Proposition 7\.8\(Monotone soundness of refinement\)\.

For every generator and budgetKK:

1. \(i\)A0⊆A1⊆⋯⊆AKA\_\{0\}\\subseteq A\_\{1\}\\subseteq\\cdots\\subseteq A\_\{K\};
2. \(ii\)every answer in eachAtA\_\{t\}is correct;
3. \(iii\)\|At\|\|A\_\{t\}\|is non\-decreasing and is a lower bound on the generator’s true accuracy;
4. \(iv\)the schedule consumes no reference answer, so the gain\|AK\|−\|A0\|\|A\_\{K\}\|\-\|A\_\{0\}\|counts additional correct certificates the generator can be driven to produce under checking, not information supplied by a key\.

###### Proof\.

\(i\) holds because accepted items are retained\. \(ii\) is soundness ofVV\(Lemma[7\.6](https://arxiv.org/html/2606.07623#S7.Thmtheorem6)\)\. \(iii\) follows from \(i\) and \(ii\)\. For \(iv\),VVdepends only on\(x,c\)\(x,c\)\(Definition[7\.5](https://arxiv.org/html/2606.07623#S7.Thmtheorem5)\), andMtM\_\{t\}records only the generator’s own rejected certificates and the checker’s reasons, themselves functions of\(x,c\)\(x,c\); no external answer enters the loop\. ∎

The countptp\_\{t\}is a logical violation cost in the sense of Definition[3\.12](https://arxiv.org/html/2606.07623#S3.Thmtheorem12), and the schedule is its greedy minimization under the preferential reading of Section[3](https://arxiv.org/html/2606.07623#S3): each round retains the cost\-zero, that is verified, proposals and carries them forward\.

Soundness fixes that every accepted answer is correct, but not how fastAtA\_\{t\}grows; that rate is set by how the generator uses the checker’s reason\. The threshold family makes this exact, because its rejection reason is*directional*: a proposed scaleλ\\lambdais told that it is too small whensλ<τs\_\{\\lambda\}<\\tau, too large whensλ−1≥τs\_\{\\lambda\-1\}\\geq\\tau, or, onceλ\\lambdais the crossing scale, the value the increment must take\.

###### Proposition 7\.9\(Reach of the refinement\)\.

Take a threshold item whose crossing scaleλ∗\\lambda^\{\\ast\}lies in\[2,R\]\[2,R\]withR=⌈\(a/\(1−τ\)\)1/α⌉R=\\lceil\(a/\(1\-\\tau\)\)^\{1/\\alpha\}\\rceil\(Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5)\), under the directional checker of Lemma[7\.6](https://arxiv.org/html/2606.07623#S7.Thmtheorem6)\(ii\) and a budgetKK\.

1. \(i\)A generator that bisects the interval still consistent with the directional replies certifies the item within⌈log2⁡\(R−1\)⌉\+1\\lceil\\log\_\{2\}\(R\-1\)\\rceil\+1rounds; forKKat least this large the schedule reaches the whole family\.
2. \(ii\)A generator that merely never re\-proposes a refuted scale certifies it withinR−1R\-1rounds\.
3. \(iii\)A generator whose proposal in each round is conditionally independent of the reply, accepting with probability at mostqq, certifies the item by roundKKwith probability at most1−\(1−q\)K1\-\(1\-q\)^\{K\}; if it cannot localizeλ∗\\lambda^\{\\ast\}from the signal thenqqis the chance of jointly guessing the scale and the increment, and the expected\|AK\|\|A\_\{K\}\|stays near\|A0\|\|A\_\{0\}\|\.

###### Proof\.

The rate hypothesis1−sλ≤a​λ−α1\-s\_\{\\lambda\}\\leq a\\lambda^\{\-\\alpha\}givesλ∗≤R\\lambda^\{\\ast\}\\leq R, so the search ranges over\[2,R\]\[2,R\]\. \(i\) Each directional reply discards the half of the feasible interval on the refuted side of the proposed midpoint, so after⌈log2⁡\(R−1\)⌉\\lceil\\log\_\{2\}\(R\-1\)\\rceilrounds the interval is\{λ∗\}\\\{\\lambda^\{\\ast\}\\\}; one further round sets the increment, which the checker has already named\. \(ii\) Each refuted proposal removes at least one scale from the feasible set, which holds fewer thanR−1R\-1members besidesλ∗\\lambda^\{\\ast\}\. \(iii\) Independence gives probability at least1−q1\-qof no acceptance in a round, hence at most1−\(1−q\)K1\-\(1\-q\)^\{K\}overKKrounds\. ∎

So with a sound directional checker the schedule erases an apparent threshold for any generator able to act on the feedback \(cases \(i\)–\(ii\)\) and fails only for one blind to it \(case \(iii\)\)\. The dividing line is a property of the generator’s search competence under the checker, not of the checker or of the certificate; by Proposition[7\.8](https://arxiv.org/html/2606.07623#S7.Thmtheorem8)soundness holds throughout, so wherever the loop advances the gain is genuine\. This is the sense in which a finite, checkable certificate is not only a witness but a handle: where the underlying capability is latent it is exposed by a cheap oracle\-free check, and where it is absent no amount of checking supplies it\.

## 8Numerical verification of the theorems

The quantitative theorems are reproduced here by controlled, exact\-arithmetic checks of the algebraic and threshold statements, in a setting where the latent task, the field, and the scoring metric are known exactly\. These are artifact checks, not experiments on a trained model\. They use finite\-field arithmetic and a fixed seed, and the ancillary material containsanc/verify\_theorems\.pytogether with the emitted CSV files and the two figures of this section\. A separate, complementary question—whether contemporary trained systems can themselves emit the finite certificates that the theorems isolate—is taken up in Section[9](https://arxiv.org/html/2606.07623#S9), and controlled measurement protocols for trained models are stated in Section[10](https://arxiv.org/html/2606.07623#S10)\.

![Refer to caption](https://arxiv.org/html/2606.07623v1/x1.png)Figure 1:Identification curve\. Solid lines are the closed formId​\(n\)=∏i=0d−1\(1−Qi−n\)I\_\{d\}\(n\)=\\prod\_\{i=0\}^\{d\-1\}\(1\-Q^\{i\-n\}\)of Theorem[5\.4](https://arxiv.org/html/2606.07623#S5.Thmtheorem4); markers are Monte\-Carlo estimates ofPr⁡\[rank⁡\(An\)=d\]\\Pr\[\\operatorname\{rank\}\(A\_\{n\}\)=d\]over uniform contexts\. The probability that every query becomes determined is the same curve\.#### Identification curve\.

Table[1](https://arxiv.org/html/2606.07623#S8.T1)compares the closed formId​\(n\)I\_\{d\}\(n\)of Theorem[5\.4](https://arxiv.org/html/2606.07623#S5.Thmtheorem4)with a Monte\-Carlo estimate ofPr⁡\[rank⁡\(An\)=d\]\\Pr\[\\operatorname\{rank\}\(A\_\{n\}\)=d\]for uniform examples over𝔽d\\mathbb\{F\}^\{d\}\. Across the configurations\(𝔽,d\)∈\{\(𝔽2,5\),\(𝔽5,4\),\(𝔽7,3\)\}\(\\mathbb\{F\},d\)\\in\\\{\(\\mathbb\{F\}\_\{2\},5\),\(\\mathbb\{F\}\_\{5\},4\),\(\\mathbb\{F\}\_\{7\},3\)\\\}and all context lengths0≤n≤2​d\+20\\leq n\\leq 2d\+2, the maximum absolute discrepancy is0\.0320\.032with600600trials per row, within the expected small\-sample Monte\-Carlo fluctuation\. The curve is exactly zero forn<dn<dand rises sharply oncen≥dn\\geq d, matching the rank\-nullity obstruction\.

Table 1:Theory versus simulation for𝔽=𝔽5\\mathbb\{F\}=\\mathbb\{F\}\_\{5\},d=4d=4\(600600trials per row\)\. Full\-determination probability matches the identification curve\.
#### Row\-space and query\-local determinacy\.

Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1)\(ii\) states that a queryqqis determined by a context exactly whenq∈Row⁡\(An\)q\\in\\operatorname\{Row\}\(A\_\{n\}\)\. Over𝔽24\\mathbb\{F\}\_\{2\}^\{4\}we enumerate, for each sampled context, all latent parameters consistent with the labels and test whether they agree onw⊤​qw^\{\\top\}q\. Across2,4002\{,\}400context–query pairs emitted by the artifact, the predicate “all consistent parameters agree onqq” coincides with “q∈Row⁡\(An\)q\\in\\operatorname\{Row\}\(A\_\{n\}\)” in every case: zero mismatches\. The fileanc/fixed\_query\_curve\.csvadditionally records the fixed\-query curve of Theorem[5\.6](https://arxiv.org/html/2606.07623#S5.Thmtheorem6), andanc/rowspace\_determinacy\.csvrecords the expected determined\-query fraction of Corollary[5\.7](https://arxiv.org/html/2606.07623#S5.Thmtheorem7)\.

#### Threshold mirage and crossing scale\.

Take the smooth confidencesλ​\(φ\)=1−a​λ−αs\_\{\\lambda\}\(\\varphi\)=1\-a\\lambda^\{\-\\alpha\}witha=1a=1,α=1\\alpha=1, and thresholdτ=0\.9\\tau=0\.9\. The thresholded metricΩτ​\(sλ\)\\Omega\_\{\\tau\}\(s\_\{\\lambda\}\)flips from0to11atλ=10\\lambda=10, exactly the crossing\-scale boundλτ=\(a/\(1−τ\)\)1/α=10\\lambda\_\{\\tau\}=\(a/\(1\-\\tau\)\)^\{1/\\alpha\}=10of Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5), while the increment ofsλs\_\{\\lambda\}at the crossing is onlys10−s9≈0\.011s\_\{10\}\-s\_\{9\}\\approx 0\.011\. A discontinuous benchmark jump is therefore produced by a confidence that changes by about one percentage point, confirming Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8); the samesλs\_\{\\lambda\}read through a continuous metric stays gradual, confirming Proposition[6\.9](https://arxiv.org/html/2606.07623#S6.Thmtheorem9)\.

![Refer to caption](https://arxiv.org/html/2606.07623v1/x2.png)Figure 2:A benchmark “jump” from a smoothly rising confidence\. The thresholded metric \(step\) flips at the crossing scaleλτ\\lambda\_\{\\tau\}predicted by Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5), althoughsλs\_\{\\lambda\}is smooth and changes by≈0\.011\\approx 0\.011there\.

## 9Certificate emission by trained language models

The simulations above check the theorems where the latent task and the metric are known in advance\. A complementary question is whether the certificate objects the theorems isolate are within reach of trained systems: presented with an in\-context problem, can a model return the same finite witness the theory uses—a row\-space combination, a crossing\-scale computation, or a selected\-set preservation judgment? We probe this with a deterministic suite whose three item families mirror the three certificate types one to one\. The suite is generated from a fixed seed, held identical across models, and graded against ground truth computed by exact arithmetic, so the questions and the answer key are fully reproducible offline\.

- •DET\(1212items\) instantiates the row\-space criterion of Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1)\(ii\) and the counts of Theorem[5\.3](https://arxiv.org/html/2606.07623#S5.Thmtheorem3)\. Over𝔽p\\mathbb\{F\}\_\{p\}withp∈\{2,5,7\}p\\in\\\{2,5,7\\\}and dimensions3≤d≤53\\leq d\\leq 5, the model must decide whether a query is forced by an in\-context linear example set and, when it is, return the forced value with a row\-space combination as witness; underdetermined items require two consistent parameters that disagree on the query\.
- •THR\(66items\) instantiates the crossing\-scale bound of Theorem[6\.5](https://arxiv.org/html/2606.07623#S6.Thmtheorem5)together with the anti\-mirage statement of Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8)\. Forsλ=1−a​λ−αs\_\{\\lambda\}=1\-a\\lambda^\{\-\\alpha\}and thresholdτ\\tau, the model must return the smallest integer scale at whichsλ≥τs\_\{\\lambda\}\\geq\\tauand the local incrementsλ−sλ−1s\_\{\\lambda\}\-s\_\{\\lambda\-1\}there\.
- •PRS\(55items\) instantiates the preservation criterion of Theorem[3\.9](https://arxiv.org/html/2606.07623#S3.Thmtheorem9)\. In a finite preferential model with logically distinguishable worlds, the model must decide whether appending a prompt preserves every prior consequence, equivalently whether the newly selected set is contained in the old one\.

To expose the transition rather than the ceiling, the panel is a weak\-to\-mid capability spread of nine systems from six independent laboratories\. The frontier tier is omitted on purpose: it certifies the whole suite outright, so it sits past the transition and carries no information about where the transition is\. Decoding is uniform—greedy, a high reasoning budget, and a single generous token ceiling so that answers are not truncated—and every response is graded only on its parsed final object, recovered by a brace\-balanced extractor that is robust to surrounding reasoning text\. Alongside the exact certificate score we record a*graded*score, the mean fraction of an item’s answer fields that are correct; this is the continuous proxy of Proposition[6\.10](https://arxiv.org/html/2606.07623#S6.Thmtheorem10), whose threshold at11returns the exact score\. The credential is read only from the environment and is never written to disk\. The runner, the exact prompt with its answer key, and the unedited responses are supplied asanc/run\_certificate\_benchmark\.py,anc/certificate\_benchmark\_protocol\.txt, andanc/certificate\_benchmark\_responses\.jsonl\.

Table 2:Certificate emission across a weak\-to\-mid panel, sorted by graded score\.*Exact*counts items whose certificate is fully correct;*Graded*is the mean fraction of correct answer fields \(Proposition[6\.10](https://arxiv.org/html/2606.07623#S6.Thmtheorem10)\)\. The threshold familyTHRis zero for every system except the strongest, which returns all six—an apparent emergent jump examined below\.![Refer to caption](https://arxiv.org/html/2606.07623v1/x3.png)Figure 3:Per\-family exact certificate accuracy across the panel, as a fraction of the2323\-item suite\. The threshold\-crossing segmentTHR\(green\) is absent for every system but the strongest, where it appears in full\.Two regularities appear\. First, the graded score exceeds the exact score for every system, and the gap is widest on the multi\-field families\. Second, and more sharply, the threshold family behaves as an emergent capability: eight of the nine systems return*no*correct threshold\-crossing certificate, and the ninth returns all six\. The exactTHRscore is thus flat at zero across most of the panel and then jumps to its maximum, while the gradedTHRconfidence rises gradually through the panel \(0,0\.5,1\.00,\\,0\.5,\\,1\.0before the crossing\)\. This is exactly the conjunctive threshold of Proposition[6\.10](https://arxiv.org/html/2606.07623#S6.Thmtheorem10): a threshold\-crossing certificate is accepted only when both the scaleλ\\lambdaand the local increment are correct \(k=2k=2\), so its exact score trackss2s^\{2\}and remains near zero until the underlying field\-confidence is high\.

![Refer to caption](https://arxiv.org/html/2606.07623v1/x4.png)Figure 4:Exact certificate accuracy against the graded proxy, one point per system per family\. The single\-field familyPRS\(k=1k=1\) lies ony=xy=x; the two\-field familyDETfollowsy=x2y=x^\{2\};THRsits at the foot of thex2x^\{2\}curve until the single crossing\. The apparent emergence is the metric artifact of Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8)and Proposition[6\.10](https://arxiv.org/html/2606.07623#S6.Thmtheorem10), not a discontinuity in the underlying confidence\.Figure[4](https://arxiv.org/html/2606.07623#S9.F4)reads the same data through the two scores\. The single\-field familyPRSlies on the diagonaly=xy=x, exactly as Proposition[6\.10](https://arxiv.org/html/2606.07623#S6.Thmtheorem10)\(iii\) requires when no conjunction is present; the row\-space familyDET, dominated by two\-field forced items, lies alongy=x2y=x^\{2\}; andTHRoccupies the foot of thex2x^\{2\}curve until the crossing\. The emergence of threshold reasoning is therefore a thresholded\-observation artifact in the precise sense of Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8), reproduced on trained systems and quantified by the conjunction countkk, rather than a discontinuity in semantic confidence\. The witnesses returned are also of the intended form: for the underdetermined instanceDET6over𝔽5\\mathbb\{F\}\_\{5\}a high\-scoring system reported “q=\(1,0,0\)q=\(1,0,0\)outside span;w=\(0,3,0\)↦0w=\(0,3,0\)\\\!\\mapsto\\\!0vsw=\(1,3,4\)↦1w=\(1,3,4\)\\\!\\mapsto\\\!1,” the two\-witness certificate of Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1)\(ii\), and forPRS4it reported “pp\-selected=\{W1\}=\\\{W\_\{1\}\\\},\(p⊕q\)\(p\\oplus q\)\-selected=\{W2,W3\}=\\\{W\_\{2\},W\_\{3\}\\\}; not a subset,” the containment test of Theorem[3\.9](https://arxiv.org/html/2606.07623#S3.Thmtheorem9)\.

This probe is deliberately small and is not evidence for any theorem; the theorems are established by proof and reproduced by the exact\-arithmetic checks above\. Its role is to show that the certificate objects are concrete enough to be requested, produced, and machine\-checked, and that the benchmark jump the theory predicts is visible on contemporary systems exactly where the conjunction count makes it sharpest\.

## 10Predictions and a falsification protocol

The theory gives three controlled measurement protocols for trained models\. Each protocol specifies what must be measured and what would falsify the proposed semantic representation\. Failure of a trained model to match a protocol would not refute the algebraic theorem; it would refute the claim that the tested behavior is well represented by that semantic task family or by that observation map\.

#### Prediction 1 \(row\-space determinacy of in\-context answers\)\.

For a model evaluated on a controlled linear in\-context task family, answer concentration on a held\-out queryqqshould track the row\-space predicateq∈Row⁡\(An\)q\\in\\operatorname\{Row\}\(A\_\{n\}\)if the model’s behavior is represented by the finite\-field semantic family\. Full\-context recovery should followId​\(n\)I\_\{d\}\(n\), while fixed\-query recovery should follow Theorem[5\.6](https://arxiv.org/html/2606.07623#S5.Thmtheorem6)\.*Protocol\.*Use synthetic linear probes over a controlled input dimensiondd; varynn; for each query record whether it is row\-space spanned, whether the model’s answer distribution is concentrated, and whether concentration follows the full or query\-local curve\.*Falsifier\.*Systematic concentration outside the row span, or systematic failure on spanned queries, falsifies this semantic representation for the tested model and task family\.

#### Prediction 2 \(emergence is a metric threshold, not a semantic discontinuity\)\.

A capability that appears to emerge under a thresholded or exact\-match score should become gradual under a continuous semantic proxy, such as per\-token log\-probability of the gold answer, answer\-distribution Brier score, calibrated confidence, or edit distance\. The scale at which the thresholded score flips should be predictable from a fitted rate bound1−sλ≤a​λ−α1\-s\_\{\\lambda\}\\leq a\\lambda^\{\-\\alpha\}\.*Protocol\.*Re\-score an emergence benchmark with both the original thresholded metric and a continuous proxy across model scales; fit the pre\-threshold confidence curve; predict the crossing scale and compare it to the observed jump\.*Falsifier\.*A discontinuity in the continuous proxy itself, after removing thresholding and discretization, would be evidence for a genuine semantic discontinuity rather than the metric artifact covered by Theorem[6\.8](https://arxiv.org/html/2606.07623#S6.Thmtheorem8)\.

#### Prediction 3 \(prompt extension can delete consequences\)\.

Appending text to a prompt can remove previously supported conclusions when the appended text changes the selected preferred models\. Which conclusions survive is given by the preservation criterion of Theorem[3\.9](https://arxiv.org/html/2606.07623#S3.Thmtheorem9): the surviving formulas are exactly those true throughout the newly selected set\.*Protocol\.*Construct prompt pairs\(p,p⋅q\)\(p,\\,p\\cdot q\), estimate a fixed battery of high\-confidence properties underpp, and measure retention after appendingqq\.*Falsifier\.*Uniform monotone retention under all prompt extensions would falsify the preferential\-update representation for that prompt family; selective retention inconsistent with the selected\-set criterion would falsify the proposed ranking model\.

## 11Theoretical problems left open

The preceding sections settle qualitative determinacy, exact finite\-field linear identification, prompt\-preservation criteria, and threshold separation\. They also expose several problems that are now sharply stated enough to be attacked directly\.

###### Problem 11\.1\(Bounded\-fragment certificate bounds\)\.

Fix a fragmentℱ\\mathcal\{F\}, a class of expansion chains, and a target schemaχ​\(f​\(q\)\)\\chi\(f\(q\)\)\. Determine the least functionBℱ​\(m,r\)B\_\{\\mathcal\{F\}\}\(m,r\)such that every entailment usingmmexamples and quantifier rank at mostrrhas a certificate of length at mostBℱ​\(m,r\)B\_\{\\mathcal\{F\}\}\(m,r\)\.

The compactness theorem gives existence but no numerical bound\. The finite\-field results show that such bounds can be exact in algebraic task families\. The next theoretical step is to connect certificate length with quantifier rank, VC\-style dimension, Littlestone\-style dimension, or rank invariants of the hypothesis quotient\.

###### Problem 11\.2\(Preferential representation dimension\)\.

Given a finite family of observed prompt consequence relations, characterize the smallest rank structure whose preferential models realize all of them\.

This is the prompt analogue of asking for a minimal automaton or minimal Kripke frame\. A solution would quantify how much hidden priority structure is necessary to explain instruction\-following behavior\.

###### Problem 11\.3\(Fragment decisiveness under scale\)\.

For a scale family\(𝔾λ\)λ\(\\mathbb\{G\}\_\{\\lambda\}\)\_\{\\lambda\}and a logical fragmentℱ\\mathcal\{F\}, characterize when everyφ∈ℱ\\varphi\\in\\mathcal\{F\}has a0\-11confidence limit\.

The limit theory in Section[6](https://arxiv.org/html/2606.07623#S6)assumes decisiveness\. The open mathematical issue is to derive decisiveness from structural hypotheses on the measured model classes, such as concentration, exchangeability, definable stability, or convergence of finite\-dimensional marginals\.

###### Problem 11\.4\(Decoder\-faithfulness rates\)\.

Given thatψ∈T∞\\psi\\in T\_\{\\infty\}, bound the rate at which the decoder failure probability forψ\\psitends to zero, or construct examples where semantic convergence holds but decoder failure remains bounded away from zero\.

This problem matters because verification of a generated output requires both semantic entailment and decoder reliability\. The present paper separates these notions; a full predictive theory must relate their rates under explicit assumptions\.

###### Problem 11\.5\(Ultraproduct invariants of capability\)\.

Identify which properties of the ultraproduct witness in Theorem[6\.6](https://arxiv.org/html/2606.07623#S6.Thmtheorem6)correspond to stable capabilities across scale families\.

Candidate invariants include saturation level, omitted types, definable cuts, and elementary embeddings between scale subsequences\. These are not benchmark labels; they are structural properties that may explain why some capabilities stabilize while others remain prompt\-sensitive\.

## 12Scope and failure modes

The construction is intentionally semantic rather than mechanistic\. It does not claim to recover transformer circuits, training data, or internal activations\. It also does not claim that the finite\-field linear family is a universal model of in\-context learning\. Its claim is conditional and external: if a behavior is represented by a measured model class, a context update, and a decoder, then prompt consequence, finite context certification, and threshold manifestation obey the theorems proved above\. This conditional form is necessary for falsifiability\. A mismatch with trained\-model measurements should be read as a failure of the proposed semantic representation for that behavior, not as a failure of first\-order compactness or finite\-field linear algebra\.

This scope matters because several tempting interpretations are false\. High benchmark accuracy does not imply membership in an almost\-sure contextual theory\. A prompt extension need not behave like monotone axiom addition\. A finite example sequence can narrow admissible answers without identifying a unique latent task\. A threshold jump can be produced by the scorer even when semantic confidence changes smoothly\. Finally, a measure\-one semantic property may still fail in sampled output if decoder faithfulness is weak\.

The value of the calculus is therefore diagnostic\. It states which extra assumption is needed for each inference\. To pass from semantic entailment to observed reliability one needs a decoder\-faithfulness bound\. To pass from finite examples to unique prediction one needs a certificate such as the row\-space condition in Theorem[5\.1](https://arxiv.org/html/2606.07623#S5.Thmtheorem1)\. To pass from benchmark emergence to semantic emergence one needs evidence about the confidence function, not only about the thresholded score\. These distinctions are exactly where verification claims about language models usually become ambiguous\.

## 13Conclusion

The paper has recast three empirical questions as finite semantic certification problems\. A pre\-trained language model is represented externally as a probability\-bearing family of first\-order structures equipped with context update and decoding\. A prompt is a preferential update on a model class\. In\-context learning is a chain of semantic expansions by finite diagrams\. Emergence is thresholded manifestation of membership in an almost\-sure limit theory\. The finite\-field linear family supplies the fully explicit case: row\-space membership is the certificate, rank is the obstruction, and the identification curves are closed\-form\.

The main mathematical mechanisms are elementary but targeted\. Measure\-one theories record semantic commitments\. Definitional invariance prevents the construction from depending on notational choices\. Preferential minima explain nonmonotonic prompt behavior and give exact preservation criteria\. Compactness converts direct\-limit entailment into finite context certificates, while indistinguishable\-prefix arguments give lower bounds\. Linear algebra supplies exact finite criteria, counting formulas, and random\-context identification probabilities in a representative task family\. Ultraproducts package asymptotic scale commitments into a single structure\. Threshold theorems separate benchmark jumps from discontinuities in semantic confidence and give rate\-sensitive crossing bounds\.

This yields a clear verification picture\. To verify a behavior, one should identify the formula expressing it, the context stage that entails it, the certificate size required to force it, the scale at which its semantic confidence clears the chosen threshold, and the decoder\-faithfulness needed to make the semantic property visible in output\. The ancillary package makes the quantitative part reproducible: it regenerates the identification curves, the row\-space determinacy checks, and the threshold\-mirage figure from the supplied Python script\. Each component is explicit, and each can fail independently\.

## References

- \[1\]Richard M\. Karp\.Reducibility among combinatorial problems\.In Raymond E\. Miller and James W\. Thatcher, editors,*Complexity of Computer Computations*, pages 85–103\. Plenum Press, 1972\.
- \[2\]Carlos E\. Alchourrón, Peter Gärdenfors, and David Makinson\.On the logic of theory change: Partial meet contraction and revision functions\.*Journal of Symbolic Logic*, 50\(2\):510–530, 1985\.
- \[3\]Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra\.Transformers learn to implement preconditioned gradient descent for in\-context learning\.*arXiv preprint arXiv:2306\.00297*, 2023\.
- \[4\]Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou\.What learning algorithm is in\-context learning? Investigations with linear models\.In*International Conference on Learning Representations*, 2023\.
- \[5\]Tom B\. Brown et al\.Language models are few\-shot learners\.*Advances in Neural Information Processing Systems*, 33:1877–1901, 2020\.
- \[6\]Aakanksha Chowdhery et al\.PaLM: Scaling language modeling with pathways\.*arXiv preprint arXiv:2204\.02311*, 2022\.
- \[7\]Michael C\. Frank and Noah D\. Goodman\.Predicting pragmatic reasoning in language games\.*Science*, 336\(6084\):998–998, 2012\.
- \[8\]Shivam Garg, Dimitris Tsipras, Percy Liang, and Gregory Valiant\.What can transformers learn in\-context? A case study of simple function classes\.*Advances in Neural Information Processing Systems*, 35:30583–30598, 2022\.
- \[9\]Noah D\. Goodman and Andreas Stuhlmüller\.Knowledge and implicature: Modeling language understanding as social cognition\.*Topics in Cognitive Science*, 5\(1\):173–184, 2013\.
- \[10\]H\. Paul Grice\.*Studies in the Way of Words*\.Harvard University Press, 1989\.
- \[11\]Irene Heim\.*The Semantics of Definite and Indefinite Noun Phrases*\.PhD thesis, University of Massachusetts Amherst, 1982\.
- \[12\]Wilfrid Hodges\.*Model Theory*\.Cambridge University Press, 1993\.
- \[13\]Jordan Hoffmann et al\.Training compute\-optimal large language models\.*arXiv preprint arXiv:2203\.15556*, 2022\.
- \[14\]Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B\. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei\.Scaling laws for neural language models\.*arXiv preprint arXiv:2001\.08361*, 2020\.
- \[15\]Sarit Kraus, Daniel Lehmann, and Menachem Magidor\.Nonmonotonic reasoning, preferential models and cumulative logics\.*Artificial Intelligence*, 44\(1–2\):167–207, 1990\.
- \[16\]David Lewis\.Scorekeeping in a language game\.*Journal of Philosophical Logic*, 8\(1\):339–359, 1979\.
- \[17\]David Marker\.*Model Theory: An Introduction*\.Springer, 2002\.
- \[18\]Richard Montague\.The proper treatment of quantification in ordinary English\.In Jaakko Hintikka, Julius M\. E\. Moravcsik, and Patrick Suppes, editors,*Approaches to Natural Language*, pages 221–242\. Reidel, 1973\.
- \[19\]Catherine Olsson et al\.In\-context learning and induction heads\.*arXiv preprint arXiv:2209\.11895*, 2022\.
- \[20\]OpenAI\.GPT\-4 technical report\.*arXiv preprint arXiv:2303\.08774*, 2023\.[https://cdn\.openai\.com/papers/gpt\-4\.pdf](https://cdn.openai.com/papers/gpt-4.pdf)\.
- \[21\]OpenAI\.Introducing GPT\-4\.1 in the API\.OpenAI technical release, 2025\.[https://openai\.com/index/gpt\-4\-1/](https://openai.com/index/gpt-4-1/)\.[https://openai\.com/index/gpt\-4\-1/](https://openai.com/index/gpt-4-1/)\.
- \[22\]OpenAI\.Introducing GPT\-5 for developers\.OpenAI technical release, 2025\.[https://openai\.com/index/introducing\-gpt\-5\-for\-developers/](https://openai.com/index/introducing-gpt-5-for-developers/)\.[https://openai\.com/index/introducing\-gpt\-5\-for\-developers/](https://openai.com/index/introducing-gpt-5-for-developers/)\.
- \[23\]OpenAI\.Introducing GPT\-5\.5\.OpenAI technical release, 2026\.[https://openai\.com/index/introducing\-gpt\-5\-5/](https://openai.com/index/introducing-gpt-5-5/)\.[https://openai\.com/index/introducing\-gpt\-5\-5/](https://openai.com/index/introducing-gpt-5-5/)\.
- \[24\]Raymond Reiter\.A logic for default reasoning\.*Artificial Intelligence*, 13\(1–2\):81–132, 1980\.
- \[25\]Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo\.Are emergent abilities of large language models a mirage?*arXiv preprint arXiv:2304\.15004*, 2023\.
- \[26\]Robert C\. Stalnaker\.Assertion\.In Peter Cole, editor,*Syntax and Semantics 9: Pragmatics*, pages 315–332\. Academic Press, 1978\.
- \[27\]Johannes von Oswald et al\.Transformers learn in\-context by gradient descent\.*arXiv preprint arXiv:2212\.07677*, 2022\.
- \[28\]Jason Wei et al\.Emergent abilities of large language models\.*Transactions on Machine Learning Research*, 2022\.
- \[29\]Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma\.An explanation of in\-context learning as implicit Bayesian inference\.In*International Conference on Learning Representations*, 2022\.
- \[30\]Sally A\. Goldman and Michael J\. Kearns\.On the complexity of teaching\.*Journal of Computer and System Sciences*, 50\(1\):20–31, 1995\.
- \[31\]H\. Jerome Keisler\.Measures and forking\.*Annals of Pure and Applied Logic*, 34\(2\):119–169, 1987\.
- \[32\]Nick Littlestone\.Learning quickly when irrelevant attributes abound: A new linear\-threshold algorithm\.*Machine Learning*, 2:285–318, 1988\.
- \[33\]Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp\.Fantastically ordered prompts and where to find them: Overcoming few\-shot prompt order sensitivity\.In*Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics*, pages 8086–8098, 2022\.
- \[34\]Pierre Simon\.*A Guide to NIP Theories*\.Cambridge University Press, 2015\.

Similar Articles

State commitment learning: training language models to distinguish computation from memory

arXiv cs.LG

This paper introduces state commitment learning, a training objective that teaches language models to distinguish temporary computation tokens from persistent state tokens. The authors propose Counterfactual Erasure RL (CERL) and the Erasure Dependence Protocol, showing improvements across math, logic, science QA, and tool-use tasks without sacrificing accuracy.