HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory

arXiv cs.AI 05/08/26, 04:00 AM Papers
Summary
This paper introduces HyperLens, a high-resolution probe to quantify cognitive effort in LLMs by tracing fine-grained confidence trajectories across layers. It reveals that complex tasks require higher cognitive effort and demonstrates how Supervised Fine-Tuning can reduce this effort, potentially degrading performance.
arXiv:2605.05741v1 Announce Type: new Abstract: While Large Language Models (LLMs) achieve strong performance across diverse tasks, their inference dynamics remain poorly understood because of the limited resolution of existing analysis tools. In this work, we identify an intrinsic magnification mechanism in transformer architectures: deeper layers inherently magnify the small changes of layer-wise confidence, providing a fine-grained confidence trajectory. Building on this insight, we introduce HyperLens, a high-resolution probe designed to trace confidence trajectories and quantify the cognitive effort during inference. Across LLMs and datasets, HyperLens reveals a consistent divergence in confidence trajectories that separates complex from simple tasks. We abstract this pattern into a quantitative cognitive effort metric. Our analysis reveals a fundamental principle: complex tasks consistently require higher cognitive effort. Finally, we provide a mechanistic diagnosis of a common side effect of standard Supervised Fine-Tuning (SFT): it can reduce cognitive effort and consequently degrade performance on in-domain tasks.
Original Article
View Cached Full Text
Cached at: 05/08/26, 08:45 AM
# HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory
Source: [https://arxiv.org/html/2605.05741](https://arxiv.org/html/2605.05741)
###### Abstract

While Large Language Models \(LLMs\) achieve strong performance across diverse tasks, their inference dynamics remain poorly understood because of the limited resolution of existing analysis tools\. In this work, we identify an intrinsic magnification mechanism in transformer architectures: deeper layers inherently magnify the small changes of layer\-wise confidence, providing a fine\-grained confidence trajectory\. Building on this insight, we introduceHyperLens, a high\-resolution probe designed to trace confidence trajectories and quantify the cognitive effort during inference\. Across88LLMs and88datasets,HyperLensreveals a consistent divergence in confidence trajectories that separates complex from simple tasks\. We abstract this pattern into a quantitative cognitive effort metric\. Our analysis reveals a fundamental principle: complex tasks consistently require higher cognitive effort\. Finally, we provide a mechanistic diagnosis of a common side effect of standard Supervised Fine\-Tuning \(SFT\): it can reduce cognitive effort and consequently degrade performance on in\-domain tasks\.

Machine Learning, ICML

## 1Introduction

While Large Language Models \(LLMs\) exhibit remarkable capabilities, understanding theirinference dynamics, i\.e\., how internal states evolve, is crucial for real\-world robustness\. A prevailing paradigm interpreting the dynamics involves monitoring theevolution of confidencevia layer\-wise projections\(nostalgebraist,[2020](https://arxiv.org/html/2605.05741#bib.bib3); Schusteret al\.,[2022](https://arxiv.org/html/2605.05741#bib.bib8); Wang,[2025](https://arxiv.org/html/2605.05741#bib.bib5); Belroseet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib4)\)\. This signal is proven effective for various tasks like hallucination and jailbreak detection\(Chuanget al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib16); Linet al\.,[2026](https://arxiv.org/html/2605.05741#bib.bib23)\), and early\-exit strategies\(Xinet al\.,[2020](https://arxiv.org/html/2605.05741#bib.bib7); Chenet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib26)\)\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/x1.png)Figure 1:The confidence trajectory of Qwen2\.5\-7B with different focal depths on easy \(CoNaLa\(Yinet al\.,[2018](https://arxiv.org/html/2605.05741#bib.bib38)\)\) versus hard \(APPS\(Hendryckset al\.,[2021a](https://arxiv.org/html/2605.05741#bib.bib39)\)\) coding tasks\. The confidence of three figures is derived with \(1\) zero focal depth, which is equivalent to Logit Lens; \(2\) shallow focal depth with one single layer; and \(3\) sufficient focal depth with five layers\.Intuitively, the trajectory of confidence should reflect the model’scognitive effort\. Specifically, for trivial tasks \(e\.g\., knowledge retrieval\) where the answer is readily apparent, the internal confidence should converge swiftly, whereas complex tasks \(e\.g\., hard math problems\) necessitate a longer low\-confidence “struggle” phase to reach the solution\. However, observations using current probing methods fail to validate this intuition, frequently producing indistinguishable trajectories regardless of task difficulty\.

We attribute this failure to the limitedresolutionof existing probes, which limits the utility to only the most obvious anomalies \(e\.g\., sudden spikes / collapses or stable changes\)\. Existing probes primarily rely on static projections\(Alain and Bengio,[2016](https://arxiv.org/html/2605.05741#bib.bib1); Ghandehariounet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib63); Palet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib27)\), most notably the Logit Lens\(Alain and Bengio,[2016](https://arxiv.org/html/2605.05741#bib.bib1)\), which decodes hidden states directly through the last unembedding layer\. These methods rely on a strong assumption that intermediate representations are linearly aligned with the output space\. Unfortunately, because of the*anisotropy*\(Ethayarajh,[2019](https://arxiv.org/html/2605.05741#bib.bib28); Timkey and Van Schijndel,[2021](https://arxiv.org/html/2605.05741#bib.bib29)\)and*representation drift*\(Gevaet al\.,[2021](https://arxiv.org/html/2605.05741#bib.bib12); Voitaet al\.,[2019b](https://arxiv.org/html/2605.05741#bib.bib30)\), these methods function as a*myopic observer*: without proper focal correction, they fail to resolve latent signals effectively\. To address this limitation, another line of work, like Tuned Lens\(Belroseet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib4)\)and others\(Cunninghamet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib33); Rajamanoharanet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib60); Ghandehariounet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib63)\)introduces an external*learned adapter*to correct the projection misalignment\. However, they inevitably act as a*biased distortion filter*: the external training introduces systematic distortions to the original signal\(Hewitt and Liang,[2019](https://arxiv.org/html/2605.05741#bib.bib31); Voita and Titov,[2020](https://arxiv.org/html/2605.05741#bib.bib32)\)\.

Unlike the current myopic or biased tools, we identify an inherent self\-magnifying mechanism within transformer layers \(Section[3](https://arxiv.org/html/2605.05741#S3)\)\. We prove that instead of directly decoding the hidden state, propagating it through a sequence of future layers before projection magnifies the derived confidence\. We term the number of the future layers as*focal depth*\.

Based on this foundation, we propose a high\-resolution probe,HyperLens, which reveals a critical phenomenon aligning with our intuition:*deepening the focal depth magnifies confidence, with the magnitude constrained by the intrinsic difficulty of the task*\. As Figure[1](https://arxiv.org/html/2605.05741#S1.F1)illustrates, under zero focal depth \(i\.e\., equivalent to Logit Lens\), the confidence trajectories for easy and hard tasks are indistinguishable across all layers\. Increasing the focal depth by 1 introduces a global magnification, while the trajectories remain entangled\. Notably, when we increase the focal depth to a sufficient level, the confidence trajectories for easy and hard tasks are clearly separated\. The easy task exhibits a more rapid increase, whereas the hard task remains at a low\-confidence state for much longer\. This divergence indicates that sufficient magnification makes the model’s internal cognitive effort observable\. To quantify this, we introduce the*refinement area \(Ω\\Omega\)*, defined as the area*above*the magnified confidence trajectory, serving as a precise measure of cognitive effort required to generate the final output\. Intuitively, harder tasks require more cognitive effort to generate the final output, therefore have a higherΩ\\Omega\. For instance, theΩ\\Omegaof Qwen2\.5\-7B increases significantly from1\.781\.78\(Easy\) to3\.053\.05\(Hard\) in code tasks \(Figure[1](https://arxiv.org/html/2605.05741#S1.F1)\(3\)\)\.

Comprehensive evaluations across88models and88datasets \(see Section[5](https://arxiv.org/html/2605.05741#S5)\) confirm that task difficulty consistently correlates with higherΩ\\Omega\. Beyond robust measurement,HyperLensprovides novel insights into the side effects of*Supervised Fine\-Tuning \(SFT\)*\(Section[6](https://arxiv.org/html/2605.05741#S6)\)\. We uncover a “blind confidence” phenomenon: simple SFT can cause abnormal decreases inΩ\\Omegaon in\-domain tasks, thereby harming the performance\. This suggests that the model learns to shortcut the low\-confidence “struggle” phase, jumping to obtaining the output without sufficient consideration\.

In summary, the contributions include:

\(1\) We proposeHyperLens, a high\-resolution probe that visualizes and quantifies the model’s cognitive effort\. UtilizingHyperLens, we clearly separate the confidence trajectories of easy and hard tasks, revealing that complex tasks necessitate a longer low\-confidence phase\.

\(2\) We formally characterize the transformer’s inherent self\-magnifying mechanism, proving that increasing focal depth strictly magnifies confidence, thereby ensuring a high\-resolution view\.

\(3\) By applyingHyperLens, we uncover that simple SFT causes a “blind confidence” problem by cutting the cognitive effort when facing in\-domain tasks, providing an interpretability explanation for performance degradation often observed in SFT models\.

## 2Related Works

Inference dynamics interpretationfocuses on theoretically understanding the evolution of internal states\. Intuitively, the inference includes two phases: \(1\) transform the input into a high\-dimensional latent space; and \(2\) map the latent state to the vocabulary space to generate the next\-token\(Olssonet al\.,[2022](https://arxiv.org/html/2605.05741#bib.bib18); Voitaet al\.,[2019a](https://arxiv.org/html/2605.05741#bib.bib13); Elhageet al\.,[2021](https://arxiv.org/html/2605.05741#bib.bib53)\)Moving beyond this high\-level decomposition, a growing body of research focuses on a more thorough analysis\. For instance,Shaiet al\.\([2024](https://arxiv.org/html/2605.05741#bib.bib58)\)prove that the residual stream effectively performs Bayesian updates across layers from the geometric perspective\.Tiberiet al\.\([2024](https://arxiv.org/html/2605.05741#bib.bib59)\)model the inference process as a kernel renormalization flow, formalizing the evolution of predictor statistics in a thermodynamic perspective\.

Probing methods\.To decode these internal dynamics, researchers propose a set of*probing methods*to observe the internal state\. According to the dependency on extra parameters, these probing methods can be classified into static and parametric probing\. Static probing methods include the widely adopted Logit Lens\(nostalgebraist,[2020](https://arxiv.org/html/2605.05741#bib.bib3)\)and its variants like\(Yeet al\.,[2025](https://arxiv.org/html/2605.05741#bib.bib62)\), which uses cross\-query semantic patching and a cosine\-based representational lens to interpret the implicit reasoning phenomena\. In this work, we focus on the static probing methods and mainly compare our method with the widely\-adopted Logit Lens\.

Parametric probing methods can be classified into two categories: one class, like Linear Probing\(Alain and Bengio,[2016](https://arxiv.org/html/2605.05741#bib.bib1)\)uses extra training to explore concept\-specific features\(Cunninghamet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib33); Rajamanoharanet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib60)\)\. This class is typically task\-driven, identifying specific features in the latent space that correspond to human\-interpretable concepts\. Another class like Tuned Lens\(Belroseet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib4)\)and Patchscopes\(Ghandehariounet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib63)\), on the other hand, utilizes extra models as an adaptor to correct the inherent flaws of static probings\. However, these methods inevitably introduce systematic distortions as stated in\(Hewitt and Liang,[2019](https://arxiv.org/html/2605.05741#bib.bib31); Voita and Titov,[2020](https://arxiv.org/html/2605.05741#bib.bib32)\)\.

Applications\.Based on the above interpretation of inference dynamics, researchers propose diverse real\-world applications that generally fall into two categories: \(1\) detecting abnormal behaviors and \(2\) improving the models’ performance\. First, by monitoring the internal states, researchers can effectively detect safety risks such as hallucinations and jailbreak attempts\(Linet al\.,[2026](https://arxiv.org/html/2605.05741#bib.bib23); Chiaet al\.,[2025](https://arxiv.org/html/2605.05741#bib.bib25); Borahet al\.,[2025](https://arxiv.org/html/2605.05741#bib.bib24)\)\. For instance,Yaoet al\.\([2024](https://arxiv.org/html/2605.05741#bib.bib61)\)propose a component\-level decoding, i\.e\., knowledge circuits, and use it to analyze and interpret language model behaviors such as hallucinations and in\-context learning\. Also, the insights obtained through interpretation help to optimize model efficiency and performance\. For example, inference\-Time Intervention \(ITI\)\(Liet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib56)\)identifies “truthfulness” directions in the activation space and shifts the inference dynamics along these vectors to enhance the generation of truthful answers\. Similarly,Chuanget al\.\([2023](https://arxiv.org/html/2605.05741#bib.bib16)\)leverage the contrast between early and late layer distributions to amplify factual signals and suppress hallucinations during decoding\.

## 3Intrinsic Magnification Mechanism

### 3\.1Formulations

Inference Dynamics\.Consider an LLM withNNtransformer layers\{f1,f2,…,fN\}\\\{f\_\{1\},f\_\{2\},\\dots,f\_\{N\}\\\}, and a vocabulary space𝒱\\mathcal\{V\}\. The inference dynamics is defined as an autoregressive process that transforms the given input token into a probability distribution over the vocabulary\. LetXt∈ℝdX\_\{t\}\\in\\mathbb\{R\}^\{d\}be the input embedding of thettth token\. The internal state evolves layer\-by\-layer through a residual mappingℱ\\mathcal\{F\}\(containing Attention and FFN blocks\)\. The internal stateHi,t∈ℝdH\_\{i,t\}\\in\\mathbb\{R\}^\{d\}at layeriiis defined recursively:

Hi,t=fi\(Hi−1,t\)=Hi−1,t\+ℱi\(Hi−1,t\),i∈\{1,…,N\},H\_\{i,t\}=f\_\{i\}\(H\_\{i\-1,t\}\)=H\_\{i\-1,t\}\+\\mathcal\{F\}\_\{i\}\(H\_\{i\-1,t\}\),i\\in\\\{1,\\dots,N\\\},\(1\)
whereH0,t=XtH\_\{0,t\}=X\_\{t\}is the initial input embedding\.

To generate the final output, the last internal stateHN,tH\_\{N,t\}is projected back into the vocabulary space via a linear unembedding matrixWU∈ℝ\|𝒱\|×dW\_\{U\}\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\\times d\}\. The final probability distributionpt∈𝒫\(\|𝒱\|\)=\{x∈ℝ\|𝒱\|∣x≥0,𝟏⊤x=1\}p\_\{t\}\\in\\mathcal\{P\}\(\|\\mathcal\{V\}\|\)=\\\{x\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\}\\mid x\\geq 0,\\mathbf\{1\}^\{\\top\}x=1\\\}is obtained by:

pt=Softmax\(WUHN,t\)\.p\_\{t\}=\\text\{Softmax\}\(W\_\{U\}H\_\{N,t\}\)\.\(2\)
For simplicity, we omit the subscriptsttandiiwhen no confusion arises, referringHi,tH\_\{i,t\}asHiH\_\{i\}orHH\.

Confidence Trajectoryis one of the most widely used signals to interpret the inference dynamics of LLMs\. For a generated sequence of lengthTT, we define the layer\-wise confidence𝒞i\\mathcal\{C\}\_\{i\}as the expectation of the Top\-K probability of the internal predicted token\. Formally:

𝒞i=1T∑t=1TTop\-K\(pi,t\)\.\\mathcal\{C\}\_\{i\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}\\text\{Top\-K\}\(p\_\{i,t\}\)\.\(3\)
pi,tp\_\{i,t\}is the probability of thettth token predicted by theiith layer, which can be obtained through a decoding function𝒟:ℝd→𝒫\(𝒱\)\\mathcal\{D\}:\\mathbb\{R\}^\{d\}\\to\\mathcal\{P\}\(\\mathcal\{V\}\), i\.e\.,pi,t=𝒟\(Hi,t\)p\_\{i,t\}=\\mathcal\{D\}\(H\_\{i,t\}\)\. Existing methods instantiate𝒟\\mathcal\{D\}differently\. The static Logit Lens employs the last unembedding matrix with softmax, i\.e\.,𝒟\(H\)=Softmax\(WUH\)\\mathcal\{D\}\(H\)=\\text\{Softmax\}\(W\_\{U\}H\), whereas in other tunable methods like Tuned Lens,𝒟\\mathcal\{D\}is a learned affine transformation\. The confidence trajectory is then the expected confidence of the predicted token across all theNNlayers\.

Decoding function with varied focal depthmm\.Unlike the prior static probing methods, we generalize the decoding function𝒟\\mathcal\{D\}from the last unembedding matrix to a composition of the finalmmlayers of the model \(and the unembedding projection\)\. Formally:

𝒟m\(H\)=Softmax\(Dm\(H\)\),\\mathcal\{D\}\_\{m\}\(H\)=\\text\{Softmax\}\(D\_\{m\}\(H\)\),\(4\)
whereDmD\_\{m\}is the composition of the finalmmlayers of the model and the unembedding projection\. Formally:

Dm\(H\)=WU\(fN∘⋯∘fN−m\+1\(H\)\)\.D\_\{m\}\(H\)=W\_\{U\}\(f\_\{N\}\\circ\\cdots\\circ f\_\{N\-m\+1\}\(H\)\)\.\(5\)
For simplicity, we defineDmD\_\{m\}as the*decoding function \(DF\)*with focal depthmmin the following analysis\.

Refinement layers\.Prior research suggests that the computation of the transformer exhibits two stages, separated by a layeri0i\_\{0\}\(see Section[2](https://arxiv.org/html/2605.05741#S2)\)\. The initial layers \(i<i0i<i\_\{0\}\) map the inputs to the latent space, and the subsequent layers \(i≥i0i\\geq i\_\{0\}\) decode the latent space to the output\. In this work, we term the layers in the latter stage as*refinement layers*\.

### 3\.2Assumptions

Confidence in the Logit Space\.To facilitate convex analysis, we analyze the confidence in the logit space rather than the probability simplex\. LetStS\_\{t\}be the Top\-K set\. We define theLogit Marginfunctiongm:ℝd→ℝg\_\{m\}:\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}as:

gm\(H\):=log∑v∈Ste𝒟m\(H\)v−log∑v∉Ste𝒟m\(H\)v\.g\_\{m\}\(H\):=\\log\\sum\_\{v\\in S\_\{t\}\}e^\{\\mathcal\{D\}\_\{m\}\(H\)\_\{v\}\}\-\\log\\sum\_\{v\\notin S\_\{t\}\}e^\{\\mathcal\{D\}\_\{m\}\(H\)\_\{v\}\}\.\(6\)Since the Softmax function is monotonic w\.r\.t the margin, proving that the sequence\-averaged margin𝒞¯i\(m\)=1T∑tgm\(Hi,t\)\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}=\\frac\{1\}\{T\}\\sum\_\{t\}g\_\{m\}\(H\_\{i,t\}\)increases is equivalent to proving that the observed confidence𝒞i\\mathcal\{C\}\_\{i\}increases\. In our analysis, we use𝒞¯i\(m\)\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}as the approximation of the confidence𝒞i\\mathcal\{C\}\_\{i\}\.

Before obtaining the theoretical results, we make the following assumptions:

###### Assumption 3\.1\(Smoothness of the Margin Function\)\.

We assume the margin functiongm\(H\)g\_\{m\}\(H\)has aβ\\beta\-Lipschitz continuous gradient over the domain of hidden states\. Formally, for any two hidden statesHi,Hk,k≥iH\_\{i\},H\_\{k\},k\\geq i, there exists a constantβ\>0\\beta\>0such that:

‖∇gm\(Hi\)−∇gm\(Hk\)‖≤β‖Hi−Hk‖\.\\\|\\nabla g\_\{m\}\(H\_\{i\}\)\-\\nabla g\_\{m\}\(H\_\{k\}\)\\\|\\leq\\beta\\\|H\_\{i\}\-H\_\{k\}\\\|\.\(7\)

The rationale behind this assumption is that the hidden states during the forward pass are normalized by LayerNorm \(restricting the domain\), and the Softmax\-Log functions comprisinggmg\_\{m\}are smooth𝒞∞\\mathcal\{C\}^\{\\infty\}functions\. Consequently, the Hessian ofgmg\_\{m\}is bounded, preventing the gradient from changing arbitrarily fast\.

###### Proposition 3\.2\(Quadratic Smoothness Bound\)\.

Under Assumption[3\.1](https://arxiv.org/html/2605.05741#S3.Thmtheorem1), for any current hidden stateHHand a residual update vectorΔ\\Delta, the margin function satisfies:

gm\(H\+Δ\)≥gm\(H\)\+⟨∇gm\(H\),Δ⟩−β2‖Δ‖2\.g\_\{m\}\(H\+\\Delta\)\\geq g\_\{m\}\(H\)\+\\langle\\nabla g\_\{m\}\(H\),\\Delta\\rangle\-\\frac\{\\beta\}\{2\}\\\|\\Delta\\\|^\{2\}\.\(8\)

While Proposition[3\.2](https://arxiv.org/html/2605.05741#S3.Thmtheorem2)is a classic result \(for proof, see Appendix[A\.1](https://arxiv.org/html/2605.05741#A1.SS1)\), it holds a specific meaning in our inference dynamics\. Here,Δ\\Deltarepresents the residual update provided by a residual function on*any*prior hidden stateHiH\_\{i\}, i\.e\.,Δ=ℱk\(Hi\),k≥i\\Delta=\\mathcal\{F\}\_\{k\}\(H\_\{i\}\),k\\geq i\.

###### Assumption 3\.3\(Gradient\-aligned Updates\)\.

For any layer indexk≥i0k\\geq i\_\{0\}\(wherei0i\_\{0\}denotes the start of the refinement layers \(see Section[3\.1](https://arxiv.org/html/2605.05741#S3.SS1)\)\), the residual updateΔ\\Deltaaligns positively with the gradient of the margin functiongm\(H\)g\_\{m\}\(H\)in expectation\. Specifically, there exists a constantμ\>0\\mu\>0such that:

𝔼\[⟨∇gm\(H\),Δ⟩∣H\]≥μ\.\\mathbb\{E\}\\left\[\\langle\\nabla g\_\{m\}\(H\),\\Delta\\rangle\\mid H\\right\]\\geq\\mu\.\(9\)

This assumption formalizes the intuition that well\-trained Transformers function as implicit optimizers\. In the late refinement stages \(k≥i0k\\geq i\_\{0\}\), the residual functionℱk\\mathcal\{F\}\_\{k\}provides updates that push the hidden stateHHcloser to the decision boundary of the target token\.

###### Assumption 3\.4\(Boundedness\)\.

The residual updates are bounded by a constantRR, i\.e\.,‖Δ‖≤R\\\|\\Delta\\\|\\leq R\. Furthermore, the fluctuations of the single\-step margin gain around its mean are bounded by a constantbb:

\|gm\(Hi,t\)−𝔼t\[gm\(Hi,t\)\]\|≤b\.\|g\_\{m\}\(H\_\{i,t\}\)\-\\mathbb\{E\}\_\{t\}\\left\[g\_\{m\}\(H\_\{i,t\}\)\\right\]\|\\leq b\.\(10\)

Assumption[3\.4](https://arxiv.org/html/2605.05741#S3.Thmtheorem4)aligns with the use of normalization and residual connections, which constrain latent increments to prevent representation divergence\.

### 3\.3Main Results

Based on the above assumptions, we obtain two fundamental properties of the confidence trajectory: the monotonic growth across layers \(Result 1\) and the magnification effect of larger focal depthmm\(Result 2\)\. For the formal proof, please refer to Appendix[A\.2](https://arxiv.org/html/2605.05741#A1.SS2)and Appendix[3\.6](https://arxiv.org/html/2605.05741#S3.Thmtheorem6)\.

Result 1: The confidence of the refinement layers increases monotonically\.The first result quantifies the trajectory of the confidence \(where focal depthmmis fixed\)\. We show that in the refinement phase, the layer\-wise confidence statistically increases layer by layer\.

###### Theorem 3\.5\(Monotonicity of Confidence\)\.

Under Assumptions[3\.1](https://arxiv.org/html/2605.05741#S3.Thmtheorem1)\-[3\.4](https://arxiv.org/html/2605.05741#S3.Thmtheorem4), for layersi≥i0i\\geq i\_\{0\}and sequence lengthTT, the probability that the sequence\-averaged confidence𝒞¯i\(m\)=1T∑tgm\(Hi,t\)\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}=\\frac\{1\}\{T\}\\sum\_\{t\}g\_\{m\}\(H\_\{i,t\}\)decreases in the next layer decays exponentially:

ℙ\(𝒞¯i\+1\(m\)≤𝒞¯i\(m\)\)≤exp⁡\(−Tγ22b2\),\\mathbb\{P\}\\left\(\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\+1\}\\leq\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}\\right\)\\leq\\exp\\left\(\-\\frac\{T\\gamma^\{2\}\}\{2b^\{2\}\}\\right\),\(11\)whereγ=μ−β2R2\\gamma=\\mu\-\\frac\{\\beta\}\{2\}R^\{2\}\.

Theorem[3\.5](https://arxiv.org/html/2605.05741#S3.Thmtheorem5)illustrates that while individual tokens may exhibit noisy confidence fluctuations, theaveragedconfidence over a sequence statistically rises in refinement layers\.

Result 2: Self\-magnifying effect of the focal depthmm\.We now analyze the impact of the focal depthmm\. We compare the confidence observed through the decoding function with focal depthm\+1m\+1𝒟m\+1\\mathcal\{D\}\_\{m\+1\}versus𝒟m\\mathcal\{D\}\_\{m\}with focal depthmmat thesamehidden stateHiH\_\{i\}\.

###### Theorem 3\.6\(Magnification of Focal Depth\)\.

LetDm\+1D\_\{m\+1\}be a decoding function with one additional layer compared toDmD\_\{m\}\. Based on Assumption[3\.3](https://arxiv.org/html/2605.05741#S3.Thmtheorem3), the confidence trajectory observed throughDm\+1D\_\{m\+1\}statistically dominatesDmD\_\{m\}with a high probability:

ℙ\(𝒞¯i\(m\+1\)\>𝒞¯i\(m\)\)≥1−exp⁡\(−Tγ22b2\)\.\\mathbb\{P\}\\left\(\\bar\{\\mathcal\{C\}\}^\{\(m\+1\)\}\_\{i\}\>\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}\\right\)\\geq 1\-\\exp\\left\(\-\\frac\{T\\gamma^\{2\}\}\{2b^\{2\}\}\\right\)\.\(12\)

Theorem[3\.6](https://arxiv.org/html/2605.05741#S3.Thmtheorem6)proves that increasingmmintroduces a monotonic upward shift of the confidence trajectory\. Intuitively,Dm\+1D\_\{m\+1\}incorporates an additional “refining” transformation towards the direction of generating the output token, thereby effectively increasing the confidence compared toDmD\_\{m\}\.

## 4High\-resolution Probing and Measurement

Based on the theoretical foundations in Section[3](https://arxiv.org/html/2605.05741#S3), we designHyperLens, a high\-resolution probe to observe the confidence trajectory in a fine\-grained manner\. Furthermore, we formalize a quantitative metric from the trajectory to measure the cognitive effort of LLMs\.

High\-resolution probing with a proper focal depth\.Theorem[3\.5](https://arxiv.org/html/2605.05741#S3.Thmtheorem5)and[3\.6](https://arxiv.org/html/2605.05741#S3.Thmtheorem6)indicate that increasing focal depthmmmagnifies the confidence in the refinement layers\. Therefore, unlike Logit Lens that fixesm=0m=0,HyperLensutilizes the decoding functionDmD\_\{m\}\(see Equation[5](https://arxiv.org/html/2605.05741#S3.E5)\) with focal depthm≥1m\\geq 1to observe the confidence\. Specifically, for each hidden stateHiH\_\{i\}, we first propagate it throughmmsubsequent layers, i\.e\.,\{fN−m\+1,…,fN\}\\\{f\_\{N\-m\+1\},\\dots,f\_\{N\}\\\}, and then project it to the logit space through the last unembedding layer to derive the magnified confidence𝒞i\(m\)\\mathcal\{C\}^\{\(m\)\}\_\{i\}\. Note that a propermmis crucial, because an excessively largemmwould over\-magnify all signals to near\-certainty, making the trajectories once again indistinguishable\. At a propermm, magnification is constrained by the tasks’ inherent difficulty, making the trajectories for hard and easy tasks clearly separated\.

Quantitative metric of cognitive effort\.Beyond visually observing the difference between easy and hard tasks, we formalize the cognitive effort into a quantitative metric\. We define*refinement area*\(Ω\\Omega\) as the area above the magnified confidence curve of the refinement layers with a fixed focal depthmm, formally:

Ω=∑k≥i0\(1−𝒞k\(m\)\)\.\\Omega=\\sum\_\{k\\geq i\_\{0\}\}\(1\-\\mathcal\{C\}^\{\(m\)\}\_\{k\}\)\.\(13\)
Intuitively,Ω\\Omegacaptures the cumulative uncertainty the model must resolve before converging on a final output, which is positively correlated with the cognitive effort: easy tasks exhibit a rapid confidence magnification, maintaining higher confidence across the refinement layers and yielding a lowerΩ\\Omega; conversely, complex tasks necessitate a longer low\-confidence struggle stage, resulting in a higherΩ\\Omega\.

Automatic computation of refinement area\.To facilitate the computation ofΩ\\Omega, we need to determine the beginning point of the refinement layers, i\.e\.,i0i\_\{0\}, automatically\. According to Theorem[3\.5](https://arxiv.org/html/2605.05741#S3.Thmtheorem5), which establishes that the confidence demonstrates a statistical monotonicity in the refinement layers, we formulate the automatic determination algorithm through a*reverse\-scanning*strategy\. Briefly, for a fixed model and focal depthmm, we scan the confidence trajectory backward from the final layer, and identifyi0i\_\{0\}as the first layer where the monotonic decreasing trend is violated\. The detailed algorithm is in Appendix[B](https://arxiv.org/html/2605.05741#A2)\.

## 5Evaluations

### 5\.1Setup

Models\.We evaluateHyperLenswith varied focal depthsmmon eight models with diverse architectures\. Whenm=0m=0, the probe is equivalent to Logit Lens\(nostalgebraist,[2020](https://arxiv.org/html/2605.05741#bib.bib3)\), the most widely used static probing method, which we use as the baseline\. Specifically, we adopt Llama3\-8B\-Instruct\(Dubeyet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib48)\)\[Llama3\-8B\], Deepseek\-Math\-7B\-Instruct\(Shaoet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib51)\)\[Deepseek\-7B\], and multiple sizes of the Qwen family, including Qwen2\.5\-Instruct\(Huiet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib50)\)\(0\.5B, 3B, 7B, 32B\) \[Qwen2\.5\-0\.5B/Qwen2\.5\-3B/Qwen2\.5\-7B/Qwen2\.5\-32B\] and Qwen3\(Yanget al\.,[2025](https://arxiv.org/html/2605.05741#bib.bib49)\)\(0\.6B, 4B\) \[Qwen3\-0\.6B/Qwen3\-4B\]\.

Datasets\.To comprehensively evaluate the cognitive effort, we conduct evaluations across diverse domains\. For each domain, we pair a foundational dataset representing easy tasks with a challenging hard\-level counterpart: \(1\) Mathematics: MATH \(Level 1\-2\)\(Hendryckset al\.,[2021b](https://arxiv.org/html/2605.05741#bib.bib37)\)versus AIME \(2023\-2025\)\(Wikipedia contributors,[2025](https://arxiv.org/html/2605.05741#bib.bib57)\); \(2\) Coding: CoNaLa\(Yinet al\.,[2018](https://arxiv.org/html/2605.05741#bib.bib38)\)versus APPS\(Hendryckset al\.,[2021a](https://arxiv.org/html/2605.05741#bib.bib39)\); \(3\) Logic: RuleTaker\(Clarket al\.,[2020](https://arxiv.org/html/2605.05741#bib.bib40)\)versus ZebraLogic\(Linet al\.,[2025](https://arxiv.org/html/2605.05741#bib.bib52)\); and \(4\) General QA: ARC\-Easy\(Clarket al\.,[2018](https://arxiv.org/html/2605.05741#bib.bib42)\)versus the expert\-level GPQA\(Reinet al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib43)\)\. All the results are obtained on100100random samples for efficiency and robustness except for AIME, where we utilize all the problems from the 2023\-2025 contests\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/x2.png)Figure 2:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on APPS\(Hendryckset al\.,[2021a](https://arxiv.org/html/2605.05741#bib.bib39)\)and ZebraLogic\(Linet al\.,[2025](https://arxiv.org/html/2605.05741#bib.bib52)\)datasets withm=1,3,5m=1,3,5\.
### 5\.2Empirical Validation

Validating Result1: the confidence trajectory increases monotonically in the refinement layers\.Figure[2](https://arxiv.org/html/2605.05741#S5.F2)plots the confidence trajectory \(as defined in Equation[3](https://arxiv.org/html/2605.05741#S3.E3), Section[3\.1](https://arxiv.org/html/2605.05741#S3.SS1)\) of four models with variedmmon two datasets\. We leave the full results in Appendix[D](https://arxiv.org/html/2605.05741#A4)for succinctness\. We observe a robust monotonic growth in confidence throughout the deeper refinement layers across all cases\. The universality of this trend across diverse architectures and domains empirically validates Theorem[3\.5](https://arxiv.org/html/2605.05741#S3.Thmtheorem5), demonstrating that the monotonically increasing trajectory is an intrinsic characteristic of the models’ inference dynamics\.

Validating Result2: larger focal depthmmmagnifies the observed confidence\.From Figure[2](https://arxiv.org/html/2605.05741#S5.F2), we can see that increasingmmstrictly magnifies the confidence in the deeper refinement layers\. Specifically, the trajectories exhibit a strict ordering in the deeper layers\. The trajectory form=5m=5\(red\) lies above that form=3m=3\(yellow\), which in turn surpassesm=1m=1\(green\) across all the models and datasets\. The strict ordering phenomenon validates Theorem[3\.6](https://arxiv.org/html/2605.05741#S3.Thmtheorem6)\. By incorporating a few transformer layers into the decoding function, we can effectively magnify the confidence signals and therefore provide a fine\-grained view\.

Further analysis\.Beyond the properties stated in Theorem[3\.5](https://arxiv.org/html/2605.05741#S3.Thmtheorem5)and Theorem[3\.6](https://arxiv.org/html/2605.05741#S3.Thmtheorem6), we also observe several interesting properties in Figure[2](https://arxiv.org/html/2605.05741#S5.F2):

*\(1\) Model\-specific sensitivity of the magnification mechanism\.*The magnitude of magnification with largermmvaries significantly across models\. For instance, Qwen2\.5\-3B exhibits a substantial gap between the trajectories ofm=1m=1andm=3or5m=3~\\text\{or\}~5, indicating a high sensitivity to smallmm\. In contrast, Deepseek\-7B shows a much narrower divergence ofm=1m=1and33\. The trajectory becomes significantly lifted untilm=5m=5, suggesting that the detailed magnification mechanism varies across models\.

*\(2\) Intrinsic beginning points of the refinement layers\.*The beginning points of the refinement layers, i\.e\., the layer where the confidence begins to climb, appear to be an intrinsic characteristic of the models with varied focal depthmm, which remains remarkably consistent across diverse datasets for any fixed model andmm\. For instance, whenm=1m=1, we can see that the beginning point is around1010,3030,2020and1010for the four respective models across datasets\. This consistency indicates that the location of refinement layers is an intrinsic property of the model, independent of the specific datasets being processed\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/x3.png)Figure 3:The confidence trajectory of Qwen2\.5\-7B across four domains comparing easy versus hard datasets with focal depthm=5m=5\.Table 1:Refinement area \(Ω\\Omega\) across models and data domains\.Modelmmi0i\_\{0\}MathCodeLogicGeneral QAEasyHardEasyHardEasyHardEasyHardLlama3\-8B0301\.02±0\.021\.02\_\{\\pm 0\.02\}1\.04±0\.031\.04\_\{\\pm 0\.03\}1\.01±0\.011\.01\_\{\\pm 0\.01\}1\.01±0\.011\.01\_\{\\pm 0\.01\}1\.02±0\.01\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}1\.02\}\_\{\\pm 0\.01\}1\.01±0\.01\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}1\.01\}\_\{\\pm 0\.01\}1\.00±0\.011\.00\_\{\\pm 0\.01\}1\.02±0\.021\.02\_\{\\pm 0\.02\}Llama3\-8B5123\.80±0\.463\.80\_\{\\pm 0\.46\}4\.18±0\.674\.18\_\{\\pm 0\.67\}3\.43±0\.383\.43\_\{\\pm 0\.38\}3\.95±0\.423\.95\_\{\\pm 0\.42\}3\.35±0\.393\.35\_\{\\pm 0\.39\}3\.93±0\.283\.93\_\{\\pm 0\.28\}2\.99±0\.432\.99\_\{\\pm 0\.43\}3\.54±0\.533\.54\_\{\\pm 0\.53\}Deepseek\-7B012\.59±0\.192\.59\_\{\\pm 0\.19\}2\.67±0\.222\.67\_\{\\pm 0\.22\}2\.71±0\.202\.71\_\{\\pm 0\.20\}2\.90±0\.332\.90\_\{\\pm 0\.33\}2\.08±0\.262\.08\_\{\\pm 0\.26\}2\.46±0\.262\.46\_\{\\pm 0\.26\}2\.39±0\.332\.39\_\{\\pm 0\.33\}2\.61±0\.332\.61\_\{\\pm 0\.33\}Deepseek\-7B1143\.19±0\.453\.19\_\{\\pm 0\.45\}3\.61±0\.783\.61\_\{\\pm 0\.78\}4\.34±1\.094\.34\_\{\\pm 1\.09\}6\.16±0\.846\.16\_\{\\pm 0\.84\}6\.22±1\.056\.22\_\{\\pm 1\.05\}6\.87±1\.416\.87\_\{\\pm 1\.41\}7\.23±1\.267\.23\_\{\\pm 1\.26\}8\.05±2\.368\.05\_\{\\pm 2\.36\}Qwen2\.5\-0\.5B0221\.01±0\.021\.01\_\{\\pm 0\.02\}1\.02±0\.021\.02\_\{\\pm 0\.02\}1\.04±0\.021\.04\_\{\\pm 0\.02\}1\.05±0\.021\.05\_\{\\pm 0\.02\}1\.06±0\.041\.06\_\{\\pm 0\.04\}1\.07±0\.031\.07\_\{\\pm 0\.03\}1\.11±0\.041\.11\_\{\\pm 0\.04\}1\.07±0\.051\.07\_\{\\pm 0\.05\}Qwen2\.5\-0\.5B533\.74±0\.483\.74\_\{\\pm 0\.48\}4\.04±0\.504\.04\_\{\\pm 0\.50\}5\.76±0\.395\.76\_\{\\pm 0\.39\}5\.78±0\.405\.78\_\{\\pm 0\.40\}6\.06±0\.50\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}6\.06\}\_\{\\pm 0\.50\}6\.03±0\.35\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}6\.03\}\_\{\\pm 0\.35\}6\.11±0\.41\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}6\.11\}\_\{\\pm 0\.41\}5\.87±0\.97\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}5\.87\}\_\{\\pm 0\.97\}Qwen2\.5\-3B0235\.07±0\.355\.07\_\{\\pm 0\.35\}5\.61±0\.385\.61\_\{\\pm 0\.38\}4\.63±0\.204\.63\_\{\\pm 0\.20\}5\.09±0\.225\.09\_\{\\pm 0\.22\}5\.06±0\.225\.06\_\{\\pm 0\.22\}5\.14±0\.195\.14\_\{\\pm 0\.19\}4\.66±0\.344\.66\_\{\\pm 0\.34\}5\.73±0\.415\.73\_\{\\pm 0\.41\}Qwen2\.5\-3B3221\.97±0\.271\.97\_\{\\pm 0\.27\}2\.24±0\.302\.24\_\{\\pm 0\.30\}2\.41±0\.322\.41\_\{\\pm 0\.32\}2\.52±0\.232\.52\_\{\\pm 0\.23\}2\.19±0\.582\.19\_\{\\pm 0\.58\}2\.87±0\.282\.87\_\{\\pm 0\.28\}2\.62±0\.642\.62\_\{\\pm 0\.64\}3\.01±0\.573\.01\_\{\\pm 0\.57\}Qwen2\.5\-7B0136\.57±0\.456\.57\_\{\\pm 0\.45\}7\.10±0\.527\.10\_\{\\pm 0\.52\}6\.74±0\.376\.74\_\{\\pm 0\.37\}6\.97±0\.346\.97\_\{\\pm 0\.34\}7\.33±0\.237\.33\_\{\\pm 0\.23\}7\.50±0\.247\.50\_\{\\pm 0\.24\}7\.52±0\.347\.52\_\{\\pm 0\.34\}7\.81±0\.397\.81\_\{\\pm 0\.39\}Qwen2\.5\-7B5121\.46±0\.401\.46\_\{\\pm 0\.40\}2\.38±0\.442\.38\_\{\\pm 0\.44\}1\.78±0\.321\.78\_\{\\pm 0\.32\}3\.05±0\.553\.05\_\{\\pm 0\.55\}2\.38±0\.522\.38\_\{\\pm 0\.52\}3\.38±0\.463\.38\_\{\\pm 0\.46\}2\.13±0\.322\.13\_\{\\pm 0\.32\}2\.63±0\.442\.63\_\{\\pm 0\.44\}Qwen2\.5\-32B0391\.13±0\.26\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}1\.13\}\_\{\\pm 0\.26\}1\.06±0\.32\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}1\.06\}\_\{\\pm 0\.32\}0\.97±0\.24\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}0\.97\}\_\{\\pm 0\.24\}1\.09±0\.26\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}1\.09\}\_\{\\pm 0\.26\}0\.73±0\.210\.73\_\{\\pm 0\.21\}1\.09±0\.191\.09\_\{\\pm 0\.19\}0\.66±0\.380\.66\_\{\\pm 0\.38\}1\.37±0\.481\.37\_\{\\pm 0\.48\}Qwen2\.5\-32B5423\.76±0\.673\.76\_\{\\pm 0\.67\}4\.78±0\.434\.78\_\{\\pm 0\.43\}3\.36±0\.733\.36\_\{\\pm 0\.73\}5\.32±0\.525\.32\_\{\\pm 0\.52\}2\.65±0\.672\.65\_\{\\pm 0\.67\}4\.53±0\.614\.53\_\{\\pm 0\.61\}2\.55±0\.552\.55\_\{\\pm 0\.55\}4\.40±0\.834\.40\_\{\\pm 0\.83\}Qwen3\-0\.6B0163\.84±0\.133\.84\_\{\\pm 0\.13\}3\.98±0\.133\.98\_\{\\pm 0\.13\}3\.70±0\.093\.70\_\{\\pm 0\.09\}3\.82±0\.113\.82\_\{\\pm 0\.11\}3\.76±0\.093\.76\_\{\\pm 0\.09\}3\.98±0\.113\.98\_\{\\pm 0\.11\}3\.59±0\.113\.59\_\{\\pm 0\.11\}3\.99±0\.123\.99\_\{\\pm 0\.12\}Qwen3\-0\.6B5142\.07±0\.212\.07\_\{\\pm 0\.21\}2\.33±0\.162\.33\_\{\\pm 0\.16\}2\.47±0\.212\.47\_\{\\pm 0\.21\}2\.70±0\.192\.70\_\{\\pm 0\.19\}2\.36±0\.13\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}2\.36\}\_\{\\pm 0\.13\}2\.34±0\.15\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}2\.34\}\_\{\\pm 0\.15\}2\.50±0\.232\.50\_\{\\pm 0\.23\}2\.70±0\.172\.70\_\{\\pm 0\.17\}Qwen3\-4B0224\.70±0\.294\.70\_\{\\pm 0\.29\}5\.00±0\.255\.00\_\{\\pm 0\.25\}4\.38±0\.274\.38\_\{\\pm 0\.27\}4\.84±0\.224\.84\_\{\\pm 0\.22\}4\.43±0\.204\.43\_\{\\pm 0\.20\}5\.09±0\.195\.09\_\{\\pm 0\.19\}3\.68±0\.213\.68\_\{\\pm 0\.21\}4\.68±0\.314\.68\_\{\\pm 0\.31\}Qwen3\-4B5211\.87±0\.211\.87\_\{\\pm 0\.21\}1\.87±0\.231\.87\_\{\\pm 0\.23\}2\.29±0\.252\.29\_\{\\pm 0\.25\}2\.52±0\.212\.52\_\{\\pm 0\.21\}2\.25±0\.31\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}2\.25\}\_\{\\pm 0\.31\}2\.03±0\.17\{\\color\[rgb\]\{0,0,1\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,1\}2\.03\}\_\{\\pm 0\.17\}1\.86±0\.321\.86\_\{\\pm 0\.32\}2\.35±0\.242\.35\_\{\\pm 0\.24\}

### 5\.3Cognitive Effort Measurement

In this section, we provide an empirical analysis of how task difficulty influences the model’s cognitive effort\.

Visualization of cognitive effort difference\.We visualize the confidence trajectory of the models dealing with easy and hard tasks with proper focal depths\. Figure[3](https://arxiv.org/html/2605.05741#S5.F3)shows the confidence trajectory of Qwen2\.5\-7B on easy and hard tasks with a focal depthm=5m=5across four domains\. The results of other models are shown in Appendix[D](https://arxiv.org/html/2605.05741#A4)\. We can observe that the confidence trajectory dealing with easy tasks \(blue curves\) remains higher than that of hard tasks \(red curves\) throughout the deeper refinement layers\. Also, the gap between easy and hard trajectories is more significant in Math and Code domains\. This is because these tasks are inherently more difficult and require higher logical efforts\. In contrast, the gap is much less in the General QA domain, where the tasks predominantly focus on textual comprehension, showing a less severe easy / hard distinction\.

Quantitative measurement of cognitive effort difference\.To quantify the distinction, we report the refinement areaΩ\\Omegaof all the tested models dealing with easy and hard tasks in Table[1](https://arxiv.org/html/2605.05741#S5.T1)\. For each model, we report theΩ\\Omegacalculated with a fixed focal depthm=0m=0\(equivalent to Logit Lens\) and the most appropriate focal depth selected fromm=\{1,3,5\}m=\\\{1,3,5\\\}that maximizes the difference between the easy and hard tasks\. We have the following observations from Table[1](https://arxiv.org/html/2605.05741#S5.T1):

\(1\)*The refinement areaΩ\\Omegaof hard tasks is always larger than that of easy tasks\.*Specifically,Ω\\Omegaof hard tasks is larger than that of easy tasks in almost all the32=8\(models\)×2\(focal depthm\)×4\(task domains\)32=8\(\\text\{models\}\)\\times 2\(\\text\{focal depth \}m\)\\times 4\(\\text\{task domains\}\)cases except for a few corner cases that we marked with blue colors\. However, the differences in these corner cases are almost negligible and happen mostly in the Logic and General QA domains, which is consistent with our visualization results stated in Figure[3](https://arxiv.org/html/2605.05741#S5.F3), i\.e\., the inherent difficulty difference between easy and hard tasks is not significant in these two domains\. The results align with our intuition stated in Section[1](https://arxiv.org/html/2605.05741#S1), i\.e\., hard tasks require more cognitive effort to solve\.

\(2\)*The difference between the refinement areaΩ\\Omegaof easy and hard tasks is more significant with a focal depthm≥1m\\geq 1\.*We can see that the difference ofΩ\\Omegabetween easy/hard tasks is larger whenm≥1m\\geq 1\(second row per model\) compared tom=0m=0baseline \(first row per model\)\. Specifically, the maximum difference expands from1\.11\.1atm=0m=0to2\.02\.0atm≥1m\\geq 1\. This contrast suggests that a non\-zero focal depth is necessary to observe the effort required for varied tasks\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-7B-Instruct/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-7B-Instruct/m5.png)

Figure 4:Semantic decoding of Qwen2\.5\-7B\. The left and right are Logit Lens \(m=0m=0\) andHyperLens\(withm=5m=5\), respectively\.Table 2:Refinement Area \(Ω\\Omega\) with and without SFTModelMATHAIMECoNaLaAPPSRuletakerZebraLogicARC EasyGPQAΩ\\OmegaΩ^\\hat\{\\Omega\}Ω\\OmegaΩ^\\hat\{\\Omega\}Ω\\OmegaΩ^\\hat\{\\Omega\}Ω\\OmegaΩ^\\hat\{\\Omega\}Ω\\OmegaΩ^\\hat\{\\Omega\}Ω\\OmegaΩ^\\hat\{\\Omega\}Ω\\OmegaΩ^\\hat\{\\Omega\}Ω\\OmegaΩ^\\hat\{\\Omega\}Llama3\-8B3\.0807\.83\.43409\.52\.7649\.33\.2888\.53\.1578\.72\.51835\.02\.221\.12\.8357\.2Llama3\-8B\-SFT1\.4156\.11\.6378\.53\.3141\.22\.4175\.32\.39\.33\.08\.51\.311\.21\.3164\.1Qwen2\.5\-7B1\.5629\.72\.42363\.91\.8558\.53\.11861\.02\.4379\.43\.42439\.02\.1336\.52\.631270\.1Qwen2\.5\-7B\-SFT1\.7227\.22\.71307\.22\.585\.24\.1545\.12\.27\.22\.56\.01\.6313\.81\.8543\.8

### 5\.4Semantic Analysis

Beyond analyzing the confidence trajectory, we also utilize the decoding function to visualize the semantic evolution of hidden states by mapping the decoded logits to the vocabulary space\. Figure[4](https://arxiv.org/html/2605.05741#S5.F4)shows the semantics of Logit Lens \(left,m=0m=0\) andHyperLens\(right,m=5m=5\) for Qwen2\.5\-7B, for other models and focal depths, see Appendix[E](https://arxiv.org/html/2605.05741#A5)\.

Consistent with prior observations\(Belroseet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib4); Wang,[2025](https://arxiv.org/html/2605.05741#bib.bib5)\), Logit Lens fails to decode meaningful semantics on models like Qwen except for the very last layer\. In Figure[4](https://arxiv.org/html/2605.05741#S5.F4)\(left\), we can only see meaningful semantics in the final layer\. As a comparison,HyperLenseffectively decodes intelligible and highly suggestive semantics from the intermediate layers\. Most surprisingly, we observe a pattern of “evolution”\. For instance, in middle layers, we decode high\-level and self\-reflection tokens likei am not sureandi want to know\. These semantics suggest that before converging to the definitive answer, the model undergoes a thinking phase\. Our results indicate that unlike Tuned Lens\(Belroseet al\.,[2023](https://arxiv.org/html/2605.05741#bib.bib4)\)that requires an extra learned adapter to correct the semantic alignment, the model’s own future layers inherently act as a natural and zero\-shot adapter to provide effective semantic signals\.

## 6Diagnosing Blind Confidence of SFT

Table 3:Accuracy with and without SFT\.ModelMathCodeLogicGeneralEasyHardEasyHardEasyHardEasyHardLlama3\-8B52%4%91%70%57%42%93%37%Llama3\-8B\-SFT24%3%73%59%54%18%82%25%Qwen2\.5\-7B69%9%87%71%71%39%94%28%Qwen2\.5\-7B\-SFT55%4%81%63%44%20%96%27%

In this section, we investigate the impact of SFT on the model’s performance usingHyperLens\.

Setup\.We employ two base models, Llama3\-8B and Qwen2\.5\-7B, and four datasets, GSM8K\(Cobbeet al\.,[2021](https://arxiv.org/html/2605.05741#bib.bib44)\), MBPP\(Austinet al\.,[2021](https://arxiv.org/html/2605.05741#bib.bib45)\), babi QA\(Westonet al\.,[2015](https://arxiv.org/html/2605.05741#bib.bib46)\), and OpenBookQA\(Mihaylovet al\.,[2018](https://arxiv.org/html/2605.05741#bib.bib47)\)for the four domains, Math, Code, Logic, and General QA, respectively, to conduct SFT using full parameter fine\-tuning of Llama\-Factory\(Zhenget al\.,[2024](https://arxiv.org/html/2605.05741#bib.bib64)\)\. The difficulty of the four datasets is between the easy and hard tasks stated in Section[5](https://arxiv.org/html/2605.05741#S5)\. We then analyze the impact using the original easy and hard task datasets on models without and with SFT, which is denoted with suffix “\-SFT”\.

Performance degradation\.Table[3](https://arxiv.org/html/2605.05741#S6.T3)shows the accuracy results on the original easy/hard task datasets with and without SFT\. We observe a significant drop in accuracy across almost all the16=2\(models\)×2\(SFT epochs\)×4\(domains\)16=2\(\\text\{models\}\)\\times 2\(\\text\{SFT epochs\}\)\\times 4\(\\text\{domains\}\)cases, except for one corner case, i\.e\., Qwen\-2\.5\-7B on the easy task of General QA\. The results suggest that simple SFT is prone to degrading the model performance\.

Mechanistic interpretation viaHyperLens\.To interpret the internal mechanism, we useHyperLensto visualize the confidence trajectory\. Figure[5](https://arxiv.org/html/2605.05741#S6.F5)shows the trajectory of Llama3\-8B with and without SFT on AIME\. For full results, see Appendix[G](https://arxiv.org/html/2605.05741#A7)\. We observe that the confidence trajectory becomes higher after SFT, suggesting that simple SFT downgrades the cognitive effort by letting the model skip the necessary low\-confidence struggle phase\. We term this phenomenon as “blind confidence”\.

Quantitatively, Table[2](https://arxiv.org/html/2605.05741#S5.T2)shows the refinement area \(Ω\\Omega\)\. We can see that whileΩ\\Omegadecreases in most cases, it paradoxically increases in Math and Code domains for Qwen\-2\.5\-7B\. From detailed analysis of the model’s outputs \(see Appendix[F](https://arxiv.org/html/2605.05741#A6)\), we find that Qwen2\.5\-7B on these domains tends to output significantly shorter yet high\-quality core tokens, like formulas and code snippets, without any helpful tokens like explanations and step\-by\-step considerations directly\. BecauseΩ\\Omegais defined as the average effort per token \(see Equation[3](https://arxiv.org/html/2605.05741#S3.E3)\), we also calculate the overall refinement areaΩ^\\hat\{\\Omega\}that aggregatesΩ\\Omegaof the entire output sequence111Ω^=∑tΩeach token,∀t∈\[1,T\]\\hat\{\\Omega\}=\\sum\_\{t\}\\Omega\_\{\\text\{each token\}\},\\forall t\\in\[1,T\]andΩeach token\\Omega\_\{\\text\{each token\}\}is calculated by measuring the confidence of each layer using a single output token\.to capture the entire effort\. We can see thatΩ^\\hat\{\\Omega\}sharply decreases across all cases after SFT\. This indicates SFT reduces the model’s total effort by lowering the per\-token cognitive effort or omitting the helpful tokens, ultimately leading to the observed performance degradation\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/x4.png)Figure 5:The confidence trajectory of Llama3\-8B with and without SFT on AIME\.
## 7Conclusion

By leveraging the transformer’s inherent self\-magnifying mechanism, we developHyperLensto precisely quantify a model’s cognitive effort, overcoming the limitations of current probes\. The effectiveness ofHyperLensactually provides a novel insight for the interpretability community, i\.e\., the most effective probes through which to view a model’s internal mysteries may be the model itself\. Our findings suggest that leveraging the model’s own features, such as the future layers, can resolve highly meaningful signals that are otherwise lost in the high\-dimensional latent space\. For future work, we will explore the utility ofHyperLensin other applications, including performance improvement, harmful behavior detection, and the interpretability of abnormal phenomena\. More importantly, we plan to uncover additional intrinsic properties within the transformer architecture that can be leveraged for self\-interpretation\. By pursuing these paths, we aim to provide fresh insights and inspire future breakthroughs in AI interpretability\.

## Impact Statement

This paper presents work whose goal is to enhance the transparency and interpretability of Large Language Models \(LLMs\)\. By introducingHyperLensto quantify the internal cognitive effort of models, our work contributes to the development of more reliable and robust AI systems\.

## References

- G\. Alain and Y\. Bengio \(2016\)Understanding intermediate layers using linear classifier probes\.arXiv preprint arXiv:1610\.01644\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1),[§2](https://arxiv.org/html/2605.05741#S2.p3.1)\.
- J\. Austin, A\. Odena, M\. Nye, M\. Bosma, H\. Michalewski, D\. Dohan, E\. Jiang, C\. Cai, M\. Terry, Q\. Le,et al\.\(2021\)Program synthesis with large language models\.arXiv preprint arXiv:2108\.07732\.Cited by:[§6](https://arxiv.org/html/2605.05741#S6.p2.1)\.
- N\. Belrose, Z\. Furman, L\. Smith, D\. Halawi, I\. Ostrovsky, L\. McKinney, S\. Biderman, and J\. Steinhardt \(2023\)Eliciting latent predictions from transformers with the tuned lens\.arXiv preprint arXiv:2303\.08112\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1),[§1](https://arxiv.org/html/2605.05741#S1.p3.1),[§2](https://arxiv.org/html/2605.05741#S2.p3.1),[§5\.4](https://arxiv.org/html/2605.05741#S5.SS4.p2.1)\.
- A\. Borah, C\. Sharma, D\. Khanna, U\. Bhatt, G\. Singh, H\. M\. Abdullah, R\. K\. Ravi, V\. Jain, J\. Patel, S\. Singh,et al\.\(2025\)Alignment quality index \(aqi\): beyond refusals: aqi as an intrinsic alignment diagnostic via latent geometry, cluster divergence, and layer wise pooled representations\.arXiv preprint arXiv:2506\.13901\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p4.1)\.
- Y\. Chen, X\. Pan, Y\. Li, B\. Ding, and J\. Zhou \(2023\)Ee\-llm: large\-scale training and inference of early\-exit large language models with 3d parallelism\.arXiv preprint arXiv:2312\.04916\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1)\.
- X\. W\. Chia, S\. L\. Wong, and J\. Pan \(2025\)Probing latent subspaces in llm for ai security: identifying and manipulating adversarial states\.arXiv preprint arXiv:2503\.09066\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p4.1)\.
- Y\. Chuang, Y\. Xie, H\. Luo, Y\. Kim, J\. Glass, and P\. He \(2023\)Dola: decoding by contrasting layers improves factuality in large language models\.arXiv preprint arXiv:2309\.03883\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1),[§2](https://arxiv.org/html/2605.05741#S2.p4.1)\.
- P\. Clark, I\. Cowhey, O\. Etzioni, T\. Khot, A\. Sabharwal, C\. Schoenick, and O\. Tafjord \(2018\)Think you have solved question answering? try arc, the ai2 reasoning challenge\.arXiv preprint arXiv:1803\.05457\.Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- P\. Clark, O\. Tafjord, and K\. Richardson \(2020\)Transformers as soft reasoners over language\.arXiv preprint arXiv:2002\.05867\.Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- K\. Cobbe, V\. Kosaraju, M\. Bavarian, M\. Chen, H\. Jun, L\. Kaiser, M\. Plappert, J\. Tworek, J\. Hilton, R\. Nakano,et al\.\(2021\)Training verifiers to solve math word problems\.arXiv preprint arXiv:2110\.14168\.Cited by:[§6](https://arxiv.org/html/2605.05741#S6.p2.1)\.
- H\. Cunningham, A\. Ewart, L\. Riggs, R\. Huben, and L\. Sharkey \(2023\)Sparse autoencoders find highly interpretable features in language models\.arXiv preprint arXiv:2309\.08600\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1),[§2](https://arxiv.org/html/2605.05741#S2.p3.1)\.
- A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Yang, A\. Fan,et al\.\(2024\)The llama 3 herd of models\.arXiv e\-prints,pp\. arXiv–2407\.Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p1.2)\.
- N\. Elhage, N\. Nanda, C\. Olsson, T\. Henighan, N\. Joseph, B\. Mann, A\. Askell, Y\. Bai, A\. Chen, T\. Conerly,et al\.\(2021\)A mathematical framework for transformer circuits\.Transformer Circuits Thread1\(1\),pp\. 12\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p1.1)\.
- K\. Ethayarajh \(2019\)How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt\-2 embeddings\.arXiv preprint arXiv:1909\.00512\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1)\.
- M\. Geva, R\. Schuster, J\. Berant, and O\. Levy \(2021\)Transformer feed\-forward layers are key\-value memories\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,pp\. 5484–5495\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1)\.
- A\. Ghandeharioun, A\. Caciularu, A\. Pearce, L\. Dixon, and M\. Geva \(2024\)Patchscopes: a unifying framework for inspecting hidden representations of language models\.InInternational Conference on Machine Learning,pp\. 15466–15490\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1),[§2](https://arxiv.org/html/2605.05741#S2.p3.1)\.
- D\. Hendrycks, S\. Basart, S\. Kadavath, M\. Mazeika, A\. Arora, E\. Guo, C\. Burns, S\. Puranik, H\. He, D\. Song,et al\.\(2021a\)Measuring coding challenge competence with apps\.arXiv preprint arXiv:2105\.09938\.Cited by:[Figure 1](https://arxiv.org/html/2605.05741#S1.F1),[Figure 1](https://arxiv.org/html/2605.05741#S1.F1.3.2),[Figure 2](https://arxiv.org/html/2605.05741#S5.F2),[Figure 2](https://arxiv.org/html/2605.05741#S5.F2.2.1),[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- D\. Hendrycks, C\. Burns, S\. Kadavath, A\. Arora, S\. Basart, E\. Tang, D\. Song, and J\. Steinhardt \(2021b\)Measuring mathematical problem solving with the math dataset\.arXiv preprint arXiv:2103\.03874\.Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- J\. Hewitt and P\. Liang \(2019\)Designing and interpreting probes with control tasks\.arXiv preprint arXiv:1909\.03368\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1),[§2](https://arxiv.org/html/2605.05741#S2.p3.1)\.
- B\. Hui, J\. Yang, Z\. Cui, J\. Yang, D\. Liu, L\. Zhang, T\. Liu, J\. Zhang, B\. Yu, K\. Lu,et al\.\(2024\)Qwen2\. 5\-coder technical report\.arXiv preprint arXiv:2409\.12186\.Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p1.2)\.
- K\. Li, O\. Patel, F\. Viégas, H\. Pfister, and M\. Wattenberg \(2023\)Inference\-time intervention: eliciting truthful answers from a language model\.Advances in Neural Information Processing Systems36,pp\. 41451–41530\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p4.1)\.
- B\. Y\. Lin, R\. L\. Bras, K\. Richardson, A\. Sabharwal, R\. Poovendran, P\. Clark, and Y\. Choi \(2025\)Zebralogic: on the scaling limits of llms for logical reasoning\.arXiv preprint arXiv:2502\.01100\.Cited by:[Figure 2](https://arxiv.org/html/2605.05741#S5.F2),[Figure 2](https://arxiv.org/html/2605.05741#S5.F2.2.1),[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- X\. Lin, P\. Li, Z\. Zeng, T\. Li, T\. Wei, X\. Ning, G\. Li, Y\. Chen, and H\. Tong \(2026\)ALERT: zero\-shot llm jailbreak detection via internal discrepancy amplification\.arXiv preprint arXiv:2601\.03600\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1),[§2](https://arxiv.org/html/2605.05741#S2.p4.1)\.
- T\. Mihaylov, P\. Clark, T\. Khot, and A\. Sabharwal \(2018\)Can a suit of armor conduct electricity? a new dataset for open book question answering\.arXiv preprint arXiv:1809\.02789\.Cited by:[§6](https://arxiv.org/html/2605.05741#S6.p2.1)\.
- nostalgebraist \(2020\)Interpreting gpt: the logit lens\.Note:[https://www\.lesswrong\.com/posts/AcKRB8wDpdaN6v6ru/interpreting\-gpt\-the\-logit\-lens](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens)Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1),[§2](https://arxiv.org/html/2605.05741#S2.p2.1),[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p1.2)\.
- C\. Olsson, N\. Elhage, N\. Nanda, N\. Joseph, N\. DasSarma, T\. Henighan, B\. Mann, A\. Askell, Y\. Bai, A\. Chen,et al\.\(2022\)In\-context learning and induction heads\.arXiv preprint arXiv:2209\.11895\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p1.1)\.
- K\. Pal, J\. Sun, A\. Yuan, B\. C\. Wallace, and D\. Bau \(2023\)Future lens: anticipating subsequent tokens from a single hidden state\.arXiv preprint arXiv:2311\.04897\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1)\.
- S\. Rajamanoharan, A\. Conmy, L\. Smith, T\. Lieberum, V\. Varma, J\. Kramar, R\. Shah, and N\. Nanda \(2024\)Improving sparse decomposition of language model activations with gated sparse autoencoders\.Advances in Neural Information Processing Systems37,pp\. 775–818\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1),[§2](https://arxiv.org/html/2605.05741#S2.p3.1)\.
- D\. Rein, B\. L\. Hou, A\. C\. Stickland, J\. Petty, R\. Y\. Pang, J\. Dirani, J\. Michael, and S\. R\. Bowman \(2024\)Gpqa: a graduate\-level google\-proof q&a benchmark\.InFirst Conference on Language Modeling,Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- T\. Schuster, A\. Fisch, J\. Gupta, M\. Dehghani, D\. Bahri, V\. Tran, Y\. Tay, and D\. Metzler \(2022\)Confident adaptive language modeling\.Advances in Neural Information Processing Systems35,pp\. 17456–17472\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1)\.
- A\. Shai, L\. Teixeira, A\. Oldenziel, S\. Marzen, and P\. Riechers \(2024\)Transformers represent belief state geometry in their residual stream\.Advances in Neural Information Processing Systems37,pp\. 75012–75034\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p1.1)\.
- Z\. Shao, P\. Wang, Q\. Zhu, R\. Xu, J\. Song, X\. Bi, H\. Zhang, M\. Zhang, Y\. Li, Y\. Wu,et al\.\(2024\)Deepseekmath: pushing the limits of mathematical reasoning in open language models\.arXiv preprint arXiv:2402\.03300\.Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p1.2)\.
- L\. Tiberi, F\. Mignacco, K\. Irie, and H\. Sompolinsky \(2024\)Dissecting the interplay of attention paths in a statistical mechanics theory of transformers\.Advances in Neural Information Processing Systems37,pp\. 72710–72753\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p1.1)\.
- W\. Timkey and M\. Van Schijndel \(2021\)All bark and no bite: rogue dimensions in transformer language models obscure representational quality\.arXiv preprint arXiv:2109\.04404\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1)\.
- E\. Voita, R\. Sennrich, and I\. Titov \(2019a\)The bottom\-up evolution of representations in the transformer: a study with machine translation and language modeling objectives\.arXiv preprint arXiv:1909\.01380\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p1.1)\.
- E\. Voita, D\. Talbot, F\. Moiseev, R\. Sennrich, and I\. Titov \(2019b\)Analyzing multi\-head self\-attention: specialized heads do the heavy lifting, the rest can be pruned\.arXiv preprint arXiv:1905\.09418\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1)\.
- E\. Voita and I\. Titov \(2020\)Information\-theoretic probing with minimum description length\.arXiv preprint arXiv:2003\.12298\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p3.1),[§2](https://arxiv.org/html/2605.05741#S2.p3.1)\.
- Z\. Wang \(2025\)LogitLens4LLMs: extending logit lens analysis to modern large language models\. arxiv\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1),[§5\.4](https://arxiv.org/html/2605.05741#S5.SS4.p2.1)\.
- J\. Weston, A\. Bordes, S\. Chopra, A\. M\. Rush, B\. Van Merriënboer, A\. Joulin, and T\. Mikolov \(2015\)Towards ai\-complete question answering: a set of prerequisite toy tasks\.arXiv preprint arXiv:1502\.05698\.Cited by:[§6](https://arxiv.org/html/2605.05741#S6.p2.1)\.
- Wikipedia contributors \(2025\)American invitational mathematics examination — Wikipedia, the free encyclopedia\.Note:[https://en\.wikipedia\.org/wiki/American\_Invitational\_Mathematics\_Examination](https://en.wikipedia.org/wiki/American_Invitational_Mathematics_Examination)Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- J\. Xin, R\. Tang, J\. Lee, Y\. Yu, and J\. Lin \(2020\)DeeBERT: dynamic early exiting for accelerating bert inference\.arXiv preprint arXiv:2004\.12993\.Cited by:[§1](https://arxiv.org/html/2605.05741#S1.p1.1)\.
- A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv,et al\.\(2025\)Qwen3 technical report\.arXiv preprint arXiv:2505\.09388\.Cited by:[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p1.2)\.
- Y\. Yao, N\. Zhang, Z\. Xi, M\. Wang, Z\. Xu, S\. Deng, and H\. Chen \(2024\)Knowledge circuits in pretrained transformers\.Advances in Neural Information Processing Systems37,pp\. 118571–118602\.Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p4.1)\.
- J\. Ye, Z\. Yao, Z\. Huang, L\. Pan, J\. Liu, Y\. Bai, A\. Xin, L\. Weichuan, X\. Che, L\. Hou,et al\.\(2025\)How do transformers learn implicit reasoning?\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§2](https://arxiv.org/html/2605.05741#S2.p2.1)\.
- P\. Yin, B\. Deng, E\. Chen, B\. Vasilescu, and G\. Neubig \(2018\)Learning to mine aligned code and natural language pairs from stack overflow\.InProceedings of the 15th international conference on mining software repositories,pp\. 476–486\.Cited by:[Figure 1](https://arxiv.org/html/2605.05741#S1.F1),[Figure 1](https://arxiv.org/html/2605.05741#S1.F1.3.2),[§5\.1](https://arxiv.org/html/2605.05741#S5.SS1.p2.1)\.
- Y\. Zheng, R\. Zhang, J\. Zhang, Y\. Ye, Z\. Luo, Z\. Feng, and Y\. Ma \(2024\)Llamafactory: unified efficient fine\-tuning of 100\+ language models\.arXiv preprint arXiv:2403\.13372\.Cited by:[§6](https://arxiv.org/html/2605.05741#S6.p2.1)\.

## Appendix AProofs of results

### A\.1Proof of Assumption[3\.1](https://arxiv.org/html/2605.05741#S3.Thmtheorem1)

###### Proof\.

From the fundamental theorem of calculus, we can express the difference as an integral along the pathH\(t\)=H\+tΔH\(t\)=H\+t\\Deltafort∈\[0,1\]t\\in\[0,1\]:

gm\(H\+Δ\)−gm\(H\)=∫01⟨∇gm\(H\+tΔ\),Δ⟩𝑑t\.g\_\{m\}\(H\+\\Delta\)\-g\_\{m\}\(H\)=\\int\_\{0\}^\{1\}\\langle\\nabla g\_\{m\}\(H\+t\\Delta\),\\Delta\\rangle\\,dt\.\(14\)Subtracting the first\-order approximation⟨∇gm\(H\),Δ⟩\\langle\\nabla g\_\{m\}\(H\),\\Delta\\ranglefrom both sides, we analyze the remainder:

\|gm\(H\+Δ\)−gm\(H\)−⟨∇gm\(H\),Δ⟩\|\\displaystyle\\left\|g\_\{m\}\(H\+\\Delta\)\-g\_\{m\}\(H\)\-\\langle\\nabla g\_\{m\}\(H\),\\Delta\\rangle\\right\|=\|∫01⟨∇gm\(H\+tΔ\)−∇gm\(H\),Δ⟩𝑑t\|\\displaystyle=\\left\|\\int\_\{0\}^\{1\}\\langle\\nabla g\_\{m\}\(H\+t\\Delta\)\-\\nabla g\_\{m\}\(H\),\\Delta\\rangle\\,dt\\right\|≤∫01‖∇gm\(H\+tΔ\)−∇gm\(H\)‖⋅‖Δ‖𝑑t\(Cauchy\-Schwarz\)\\displaystyle\\leq\\int\_\{0\}^\{1\}\\\|\\nabla g\_\{m\}\(H\+t\\Delta\)\-\\nabla g\_\{m\}\(H\)\\\|\\cdot\\\|\\Delta\\\|\\,dt\\quad\(\\text\{Cauchy\-Schwarz\}\)≤∫01β‖tΔ‖⋅‖Δ‖𝑑t\(by Assumption[3\.1](https://arxiv.org/html/2605.05741#S3.Thmtheorem1)\)\\displaystyle\\leq\\int\_\{0\}^\{1\}\\beta\\\|t\\Delta\\\|\\cdot\\\|\\Delta\\\|\\,dt\\quad\(\\text\{by Assumption~\\ref\{assump\-lipschitz\}\}\)=β‖Δ‖2∫01t𝑑t=β2‖Δ‖2\.\\displaystyle=\\beta\\\|\\Delta\\\|^\{2\}\\int\_\{0\}^\{1\}t\\,dt=\\frac\{\\beta\}\{2\}\\\|\\Delta\\\|^\{2\}\.\(15\)Removing the absolute value gives the lower bound:gm\(H\+Δ\)−gm\(H\)≥⟨∇gm\(H\),Δ⟩−β2‖Δ‖2g\_\{m\}\(H\+\\Delta\)\-g\_\{m\}\(H\)\\geq\\langle\\nabla g\_\{m\}\(H\),\\Delta\\rangle\-\\frac\{\\beta\}\{2\}\\\|\\Delta\\\|^\{2\}\. ∎

### A\.2Proof of Theorem[3\.5](https://arxiv.org/html/2605.05741#S3.Thmtheorem5)\(Monotonicity of Confidence\)

To make the notation simpler and easier to follow, we assign a specific subscript to the residual update in this and the next subsection\. It represents the update contributed by thekk\-th residual block on the hidden state at layerii:Δi,k\(Hi,t\)=ℱk\(Hi,t\),k\>i\\Delta\_\{i,k\}\(H\_\{i,t\}\)=\\mathcal\{F\}\_\{k\}\(H\_\{i,t\}\),~k\>i\.

In this section, we provide the derivation for the probabilistic bound on the confidence growth across layers\. We establish that with sufficient alignment driftμ\\muand sequence lengthTT, the average margin𝒞¯i\(m\)\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}increases monotonically with high probability\.

#### A\.2\.1Step 1: Local Gain Analysis

Consider the transition from layeriitoi\+1i\+1\. For a specific token at positiontt, the hidden state updates asHi\+1,t=Hi,t\+Δi\+1,tH\_\{i\+1,t\}=H\_\{i,t\}\+\\Delta\_\{i\+1,t\}\. We define the single\-step margin gain random variableXi,tX\_\{i,t\}as:

Xi,t:=gm\(Hi\+1,t\)−gm\(Hi,t\)\.X\_\{i,t\}:=g\_\{m\}\(H\_\{i\+1,t\}\)\-g\_\{m\}\(H\_\{i,t\}\)\.\(16\)Invoking theQuadratic Smoothness Bound\(Proposition[3\.2](https://arxiv.org/html/2605.05741#S3.Thmtheorem2)\), which relates the function change to the gradient and the norm of the update, we have:

Xi,t≥⟨∇gm\(Hi,t\),Δi\+1,t⟩−β2‖Δi\+1,t‖2\.X\_\{i,t\}\\geq\\langle\\nabla g\_\{m\}\(H\_\{i,t\}\),\\Delta\_\{i\+1,t\}\\rangle\-\\frac\{\\beta\}\{2\}\\\|\\Delta\_\{i\+1,t\}\\\|^\{2\}\.\(17\)Taking the conditional expectation with respect to the current state \(filtrationℱt\\mathcal\{F\}\_\{t\}\), and applying Assumption[3\.3](https://arxiv.org/html/2605.05741#S3.Thmtheorem3)\(Positive Alignment Drift\) and Assumption[3\.4](https://arxiv.org/html/2605.05741#S3.Thmtheorem4)\(Boundedness‖Δ‖≤R\\\|\\Delta\\\|\\leq R\):

𝔼\[Xi,t∣ℱt\]\\displaystyle\\mathbb\{E\}\[X\_\{i,t\}\\mid\\mathcal\{F\}\_\{t\}\]≥𝔼\[⟨∇gm\(Hi,t\),Δi\+1,t⟩\]−β2𝔼\[‖Δi\+1,t‖2\]\\displaystyle\\geq\\mathbb\{E\}\\left\[\\langle\\nabla g\_\{m\}\(H\_\{i,t\}\),\\Delta\_\{i\+1,t\}\\rangle\\right\]\-\\frac\{\\beta\}\{2\}\\mathbb\{E\}\\left\[\\\|\\Delta\_\{i\+1,t\}\\\|^\{2\}\\right\]\(18\)≥μ−β2R2\.\\displaystyle\\geq\\mu\-\\frac\{\\beta\}\{2\}R^\{2\}\.We define theeffective driftγ:=μ−β2R2\\gamma:=\\mu\-\\frac\{\\beta\}\{2\}R^\{2\}\. Assuming the alignment component dominates the curvature penalty, we haveγ\>0\\gamma\>0\. Thus, the expected gain for any single token is strictly positive:

𝔼\[Xi,t\]≥γ\.\\mathbb\{E\}\[X\_\{i,t\}\]\\geq\\gamma\.\(19\)

#### A\.2\.2Step 2: Sequence Aggregation and Martingale Construction

We aim to bound the behavior of the average margin over the prompt of lengthTT\. The change in the average margin is:

Δ𝒞¯=𝒞¯i\+1\(m\)−𝒞¯i\(m\)=1T∑t=1TXi,t\.\\Delta\\bar\{\\mathcal\{C\}\}=\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\+1\}\-\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}X\_\{i,t\}\.\(20\)LetST=∑t=1TXi,tS\_\{T\}=\\sum\_\{t=1\}^\{T\}X\_\{i,t\}be the total gain\. We construct a Martingale Difference Sequence \(MDS\) to analyze the fluctuations ofSTS\_\{T\}around its expectation\. LetYtY\_\{t\}be the centered variable:

Yt=Xi,t−𝔼\[Xi,t∣ℱt−1\]\.Y\_\{t\}=X\_\{i,t\}\-\\mathbb\{E\}\[X\_\{i,t\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\.\(21\)The sequence\{Zk\}k=1T\\\{Z\_\{k\}\\\}\_\{k=1\}^\{T\}defined byZk=∑t=1kYtZ\_\{k\}=\\sum\_\{t=1\}^\{k\}Y\_\{t\}is a martingale\. According to Assumption[3\.4](https://arxiv.org/html/2605.05741#S3.Thmtheorem4), the fluctuations of the margin are bounded bybb\. Specifically, we assume the centered random variable is bounded such that\|Yt\|≤b\|Y\_\{t\}\|\\leq balmost surely\.

#### A\.2\.3Step 3: Concentration via Azuma\-Hoeffding

We are interested in the failure probability, defined as the event where the average margin does not increase \(i\.e\.,Δ𝒞¯≤0\\Delta\\bar\{\\mathcal\{C\}\}\\leq 0\)\. This is equivalent to sumST≤0S\_\{T\}\\leq 0\.

Decomposing the sum:

ST=∑t=1T𝔼\[Xi,t\]\+∑t=1TYt≥Tγ\+ZT\.S\_\{T\}=\\sum\_\{t=1\}^\{T\}\\mathbb\{E\}\[X\_\{i,t\}\]\+\\sum\_\{t=1\}^\{T\}Y\_\{t\}\\geq T\\gamma\+Z\_\{T\}\.\(22\)Thus, the eventST≤0S\_\{T\}\\leq 0impliesTγ\+ZT≤0T\\gamma\+Z\_\{T\}\\leq 0, orZT≤−TγZ\_\{T\}\\leq\-T\\gamma\. We apply theAzuma\-Hoeffding Inequalityfor bounded martingale differences\. For anyϵ\>0\\epsilon\>0:

ℙ\(ZT≤−ϵ\)≤exp⁡\(−ϵ22∑t=1Tct2\),\\mathbb\{P\}\(Z\_\{T\}\\leq\-\\epsilon\)\\leq\\exp\\left\(\-\\frac\{\\epsilon^\{2\}\}\{2\\sum\_\{t=1\}^\{T\}c\_\{t\}^\{2\}\}\\right\),\(23\)wherectc\_\{t\}is the bound onYtY\_\{t\}\. Since\|Yt\|≤b\|Y\_\{t\}\|\\leq b, the range ofYtY\_\{t\}is at most2b2b, relying on the standard form for bounded random variables centered at 0, strictly\|Yt\|≤b\|Y\_\{t\}\|\\leq bimplies the step squared in the denominator is bounded\. Substitutingϵ=Tγ\\epsilon=T\\gamma:

ℙ\(Δ𝒞¯≤0\)=ℙ\(ST≤0\)\\displaystyle\\mathbb\{P\}\\left\(\\Delta\\bar\{\\mathcal\{C\}\}\\leq 0\\right\)=\\mathbb\{P\}\(S\_\{T\}\\leq 0\)≤ℙ\(ZT≤−Tγ\)\\displaystyle\\leq\\mathbb\{P\}\(Z\_\{T\}\\leq\-T\\gamma\)\(24\)≤exp⁡\(−\(Tγ\)22∑t=1Tb2\)\\displaystyle\\leq\\exp\\left\(\-\\frac\{\(T\\gamma\)^\{2\}\}\{2\\sum\_\{t=1\}^\{T\}b^\{2\}\}\\right\)=exp⁡\(−T2γ22Tb2\)\\displaystyle=\\exp\\left\(\-\\frac\{T^\{2\}\\gamma^\{2\}\}\{2Tb^\{2\}\}\\right\)=exp⁡\(−Tγ22b2\)\.\\displaystyle=\\exp\\left\(\-\\frac\{T\\gamma^\{2\}\}\{2b^\{2\}\}\\right\)\.This concludes the proof\. The average confidence increases monotonically with a probability that approaches 1 exponentially fast as the sequence lengthTTor the effective alignment strengthγ\\gammaincreases\. ∎

### A\.3Proof of Theorem[3\.6](https://arxiv.org/html/2605.05741#S3.Thmtheorem6)\(Magnification of Focal Depth\)

To prove Theorem[3\.6](https://arxiv.org/html/2605.05741#S3.Thmtheorem6), we treat the increase in decoding head depth effectively as a “virtual” forward pass through an additional Transformer layer which is similar to[A\.2](https://arxiv.org/html/2605.05741#A1.SS2)\.

#### A\.3\.1Step 1: Local Function Analysis

LetDm\(⋅\)D\_\{m\}\(\\cdot\)denote the function represented by the decoding head consisting of the finalmmlayers of the model plus the unembedding projection\. Letℱk\(⋅\)\\mathcal\{F\}\_\{k\}\(\\cdot\)denote the transformation of thekk\-th Transformer block \(wherek=N−mk=N\-mis the specific layer added when moving from depthmmtom\+1m\+1\)\.

By definition, applying the deeper headDm\+1D\_\{m\+1\}to a hidden stateHi,tH\_\{i,t\}is mathematically equivalent to passingHi,tH\_\{i,t\}through the blockℱN−m\\mathcal\{F\}\_\{N\-m\}first, and then applying the shallower headDmD\_\{m\}to the result\. Formally:

Dm\+1\(Hi,t\)≡Dm\(ℱN−m\(Hi,t\)\)\.D\_\{m\+1\}\(H\_\{i,t\}\)\\equiv D\_\{m\}\\left\(\\mathcal\{F\}\_\{N\-m\}\(H\_\{i,t\}\)\\right\)\.\(25\)Consequently, the logit margin calculated by the deeper head, denoted as𝒞i,t\(m\+1\)\\mathcal\{C\}^\{\(m\+1\)\}\_\{i,t\}, can be expressed in terms of the margin functiongmg\_\{m\}evaluated at a perturbed state:

𝒞i,t\(m\+1\)=gm\(ℱN−m\(Hi,t\)\)=gm\(Hi,t\+Δhead\),\\mathcal\{C\}^\{\(m\+1\)\}\_\{i,t\}=g\_\{m\}\\left\(\\mathcal\{F\}\_\{N\-m\}\(H\_\{i,t\}\)\\right\)=g\_\{m\}\(H\_\{i,t\}\+\\Delta\_\{head\}\),\(26\)whereΔhead=ℱN−m\(Hi,t\)−Hi,t\\Delta\_\{head\}=\\mathcal\{F\}\_\{N\-m\}\(H\_\{i,t\}\)\-H\_\{i,t\}represents the residual update contributed by the specific layerN−mN\-m\.

#### A\.3\.2Step 2: Sequence Aggregation and Martingale Construction

We are interested in the probability that the average margin increases, i\.e\.,1T∑t\(𝒞i,t\(m\+1\)−𝒞i,t\(m\)\)\>0\\frac\{1\}\{T\}\\sum\_\{t\}\(\\mathcal\{C\}^\{\(m\+1\)\}\_\{i,t\}\-\\mathcal\{C\}^\{\(m\)\}\_\{i,t\}\)\>0\. LetXt=𝒞i,t\(m\+1\)−𝒞i,t\(m\)X\_\{t\}=\\mathcal\{C\}^\{\(m\+1\)\}\_\{i,t\}\-\\mathcal\{C\}^\{\(m\)\}\_\{i,t\}be the instantaneous gain at tokentt\.

Using the Quadratic Smoothness Bound \(Proposition[3\.2](https://arxiv.org/html/2605.05741#S3.Thmtheorem2)\), we expand the gain:

Xt=gm\(Hi,t\+Δhead\)−gm\(Hi,t\)≥⟨∇gm\(Hi,t\),Δhead⟩−β2‖Δhead‖2\.X\_\{t\}=g\_\{m\}\(H\_\{i,t\}\+\\Delta\_\{head\}\)\-g\_\{m\}\(H\_\{i,t\}\)\\geq\\langle\\nabla g\_\{m\}\(H\_\{i,t\}\),\\Delta\_\{head\}\\rangle\-\\frac\{\\beta\}\{2\}\\\|\\Delta\_\{head\}\\\|^\{2\}\.\(27\)
By Assumption[3\.4](https://arxiv.org/html/2605.05741#S3.Thmtheorem4), the residual norm is bounded byRR\. By Assumption[3\.3](https://arxiv.org/html/2605.05741#S3.Thmtheorem3), provided the layer indexN−mN\-mfalls within the refinement phase \(N−m≥i0N\-m\\geq i\_\{0\}\), the alignment condition holds:𝔼\[⟨∇gm,Δhead⟩\]≥μ\\mathbb\{E\}\[\\langle\\nabla g\_\{m\},\\Delta\_\{head\}\\rangle\]\\geq\\mu\. Incorporating these into the expectation ofXtX\_\{t\}:

𝔼\[Xt∣ℱt−1\]\\displaystyle\\mathbb\{E\}\[X\_\{t\}\\mid\\mathcal\{F\}\_\{t\-1\}\]≥𝔼\[⟨∇gm\(Hi,t\),Δhead⟩\]−β2R2\\displaystyle\\geq\\mathbb\{E\}\\left\[\\langle\\nabla g\_\{m\}\(H\_\{i,t\}\),\\Delta\_\{head\}\\rangle\\right\]\-\\frac\{\\beta\}\{2\}R^\{2\}\(28\)≥μ−β2R2:=γ\.\\displaystyle\\geq\\mu\-\\frac\{\\beta\}\{2\}R^\{2\}:=\\gamma\.\(29\)Here,γ\\gammarepresents the strictly positive effective drift \(under the condition that alignment strengthμ\\mudominates the curvature penalty\)\.

#### A\.3\.3Step 3: Concentration via Azuma\-Hoeffding

We now bound the probability that the empirical averageX¯=1T∑t=1TXt\\bar\{X\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}X\_\{t\}is non\-positive\. LetST=∑t=1TXtS\_\{T\}=\\sum\_\{t=1\}^\{T\}X\_\{t\}\. We construct a martingale difference sequence to control the deviation from the expected growth\.

Since the single\-step fluctuations are bounded bybb\(Assumption[3\.4](https://arxiv.org/html/2605.05741#S3.Thmtheorem4)\), we apply the Azuma\-Hoeffding inequality\. We analyze the event where the sum is negative:

ℙ\(ST≤0\)=ℙ\(ST−𝔼\[ST\]≤−𝔼\[ST\]\)\.\\mathbb\{P\}\(S\_\{T\}\\leq 0\)=\\mathbb\{P\}\(S\_\{T\}\-\\mathbb\{E\}\[S\_\{T\}\]\\leq\-\\mathbb\{E\}\[S\_\{T\}\]\)\.\(30\)Since𝔼\[ST\]≥Tγ\\mathbb\{E\}\[S\_\{T\}\]\\geq T\\gamma, this implies:

ℙ\(ST≤0\)≤ℙ\(ST−𝔼\[ST\]≤−Tγ\)\.\\mathbb\{P\}\(S\_\{T\}\\leq 0\)\\leq\\mathbb\{P\}\(S\_\{T\}\-\\mathbb\{E\}\[S\_\{T\}\]\\leq\-T\\gamma\)\.\(31\)Applying the inequality for bounded random variables:

ℙ\(𝒞¯i\(m\+1\)−𝒞¯i\(m\)≤0\)≤exp⁡\(−\(Tγ\)22∑t=1Tb2\)=exp⁡\(−T2γ22Tb2\)=exp⁡\(−Tγ22b2\)\.\\mathbb\{P\}\\left\(\\bar\{\\mathcal\{C\}\}^\{\(m\+1\)\}\_\{i\}\-\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}\\leq 0\\right\)\\leq\\exp\\left\(\-\\frac\{\(T\\gamma\)^\{2\}\}\{2\\sum\_\{t=1\}^\{T\}b^\{2\}\}\\right\)=\\exp\\left\(\-\\frac\{T^\{2\}\\gamma^\{2\}\}\{2Tb^\{2\}\}\\right\)=\\exp\\left\(\-\\frac\{T\\gamma^\{2\}\}\{2b^\{2\}\}\\right\)\.\(32\)Conversely, the probability that the deeper head yields a strictly higher confidence is:

ℙ\(𝒞¯i\(m\+1\)\>𝒞¯i\(m\)\)≥1−exp⁡\(−Tγ22b2\)\.\\mathbb\{P\}\\left\(\\bar\{\\mathcal\{C\}\}^\{\(m\+1\)\}\_\{i\}\>\\bar\{\\mathcal\{C\}\}^\{\(m\)\}\_\{i\}\\right\)\\geq 1\-\\exp\\left\(\-\\frac\{T\\gamma^\{2\}\}\{2b^\{2\}\}\\right\)\.\(33\)This completes the proof\.□\\square

## Appendix BAlgorithms to compute refinement index automatically

### B\.1Algorithm to compute re

Algorithm 1Calculation of Refinement End \(rere\)Input:Data sequence

𝒞=\{𝒞0,𝒞1,…,𝒞k\}\\mathcal\{C\}=\\\{\\mathcal\{C\}\_\{0\},\\mathcal\{C\}\_\{1\},\\dots,\\mathcal\{C\}\_\{k\}\\\}, scalar

thresholdthreshold
Output:Index

rere
Initialize

re=0re=0\{Find the minimum index

iisuch that

∀j≥i,𝒞j≥𝒞k−threshold\\forall j\\geq i,\\mathcal\{C\}\_\{j\}\\geq\\mathcal\{C\}\_\{k\}\-threshold\}

for

i=k−1i=k\-1down to

0do

if

𝒞i<𝒞k−threshold\\mathcal\{C\}\_\{i\}<\\mathcal\{C\}\_\{k\}\-thresholdthen

re=i\+1re=i\+1

break

endif

endfor

return

rere

In the refinement layers, we observe that almost all the curve increases firstly, then maintains in a relatively high value\. We call the middle variable Refinement End \(re\)\. To robustly find the index, we set a threshold value\. Specifically in the experiment, we setthreshold=0\.07threshold=0\.07\.

### B\.2Algorithm to computei0i\_\{0\}

Algorithm 2Calculation ofi0i\_\{0\}Input:Data sequence

𝒞\\mathcal\{C\}, index

rere, scalar

thresholdthreshold
Output:Index

i0i\_\{0\}
Initialize

rmin=rermin=re\{Step 1: Find

rminrminsatisfying the convexity condition\}

for

i=re−1i=re\-1down to

0do

if

i\>0i\>0then

val=\(𝒞i\+𝒞i−1\)/2val=\(\\mathcal\{C\}\_\{i\}\+\\mathcal\{C\}\_\{i\-1\}\)/2

else

val=𝒞0val=\\mathcal\{C\}\_\{0\}\{Using

𝒞0≤𝒞1\\mathcal\{C\}\_\{0\}\\leq\\mathcal\{C\}\_\{1\}boundary check logic\}

endif

if

val≤𝒞i\+1val\\leq\\mathcal\{C\}\_\{i\+1\}then

rmin=irmin=i

else

break

endif

endfor

Initialize

i0=rmini\_\{0\}=rmin\{Step 2: Find

i0i\_\{0\}based on

rminrminand threshold\}

for

j=rminj=rminto

re−1re\-1do

if

j≥rmin\+thresholdj\\geq rmin\+thresholdthen

i0=j−1i\_\{0\}=j\-1

break

endif

endfor

return

i0i\_\{0\}

Since the refinement layers are the layers that decode the latent space to the output vocabulary\. By our observation, we find that in these layers, almost all the curves increase firstly, then maintains in a relatively high value for a while\. So to find thei0i\_\{0\}, we need to define the meaning of increase\. To make it robust, we let the definition of increase to be𝒞i−1\+𝒞i2≤𝒞i\+1\\frac\{\\mathcal\{C\}\_\{i\-1\}\+\\mathcal\{C\}\_\{i\}\}\{2\}\\leq\\mathcal\{C\}\_\{i\+1\}\. Using the definition of increase and the middle variable re, we can find another middle variable rmin which means the start index of robust increase\. Similarly to calculate the middle variable re, we set a threshold to findi0i\_\{0\}\. Here we also setthreshold=0\.07threshold=0\.07in the experiments\.

### B\.3Algorithm to compute Refinement Area \(Ω\\Omega\)

Algorithm 3Calculation of Refinement Area \(Ω\\Omega\)Input:Data sequence

𝒞=\{𝒞0,𝒞1,…,𝒞k\}\\mathcal\{C\}=\\\{\\mathcal\{C\}\_\{0\},\\mathcal\{C\}\_\{1\},\\dots,\\mathcal\{C\}\_\{k\}\\\}, index

i0i\_\{0\}
Output:Value

Ω\\Omega
Initialize

Ω=0\\Omega=0\{Calculate Refinement Area

Ω\\Omegaaccording to

i0i\_\{0\}\}

for

i=i0i=i\_\{0\}to

kkdo

Ω=Ω\+\(1−𝒞i\)\\Omega=\\Omega\+\(1\-\\mathcal\{C\}\_\{i\}\)

endfor

return

Ω\\Omega

When we find the start point of refinement layersi0i\_\{0\}, according to the definition in the main text,Ω=∑i≥i0\(1−𝒞i\)\\Omega=\\sum\_\{i\\geq i\_\{0\}\}\(1\-\\mathcal\{C\}\_\{i\}\)\(area above the confidence curve\)\.

## Appendix CFull Results of Varied Focal Depth m

In this section, we present the comprehensive experimental results regarding the impact of focal depthmmon confidence trajectories\. Figures[6](https://arxiv.org/html/2605.05741#A3.F6)through[13](https://arxiv.org/html/2605.05741#A3.F13)display the confidence trajectories for four models \(Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, and Deepseek\-7B\) across eight diverse datasets: MATH \(Level 1\-2\), AIME, CoNaLa, APPS, RuleTaker, ZebraLogic, ARC\-Easy, and GPQA\.

These extensive results universally corroborate the conclusions drawn in the main text:Universal Magnification:The strict ordering of trajectories \(m=5\>m=3\>m=1m=5\>m=3\>m=1\) holds across almost all diverse tasks, confirming that larger focal depths consistently magnify confidence signals\.Robust Model Characteristics:Both the model\-specific sensitivity tommand the intrinsic starting locations of the refinement layers almost remain consistent regardless of the input dataset\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/math-level12_combined_m_comparison_plot.png)Figure 6:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset Math \(Level 1\-2\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/aime_combined_m_comparison_plot.png)Figure 7:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset AIME\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/conala_combined_m_comparison_plot.png)Figure 8:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset CoNaLa\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/apps_combined_m_comparison_plot.png)Figure 9:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset APPS\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/ruletaker_combined_m_comparison_plot.png)Figure 10:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset Ruletaker\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/logicbench_combined_m_comparison_plot.png)Figure 11:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset ZebraLogic\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/arc-easy_combined_m_comparison_plot.png)Figure 12:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset ARC Easy\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/datasets_m_varied/gpqa_combined_m_comparison_plot.png)Figure 13:The confidence trajectory of Llama3\-8B, Qwen2\.5\-3B, Qwen3\-0\.6B, Deepseek\-7B on dataset GPQA\.
## Appendix DFull Results of Cognitive Effort Difference

In the main text, we demonstrated the cognitive effort measurement using Qwen2\.5\-7B as a representative example\. In this section, we provide the complete visualization results across a diverse set of LLMs with varying parameter sizes \(0\.5B to 32B\) and architectures \(Llama\-3, Deepseek, Qwen2\.5, and Qwen\-3\)\. We compare the confidence trajectories under standard decoding \(focal depthm=0m=0\) and our look\-ahead approach \(e\.g\.,m=5m=5\)\.

### D\.1Impact of Focal Depth on Trajectory Visibility

Figures[14](https://arxiv.org/html/2605.05741#A4.F14)through[29](https://arxiv.org/html/2605.05741#A4.F29)illustrate the confidence trajectories for different models\. A distinct pattern emerges when comparing the focal depths:

Standard Decoding \(m=0m=0\):As seen in Figure[14](https://arxiv.org/html/2605.05741#A4.F14)\(Llama3\-8B\) and Figure[40](https://arxiv.org/html/2605.05741#A5.F40)\(Qwen2\.5\-32B\), the confidence trajectories for the standard next\-token prediction often exhibit a trivial step\-function behavior or rapid saturation\. In these cases, the confidence remains near zero or shoots to one immediately, masking the internal reasoning process\. This confirms that standard output probabilities are insufficient for fine\-grained cognitive effort measurement\.

Look\-ahead Refinement \(m≥1m\\geq 1\):When a focal depth is applied \(e\.g\.,m=5m=5in Figure[15](https://arxiv.org/html/2605.05741#A4.F15)and Figure[25](https://arxiv.org/html/2605.05741#A4.F25)\), the trajectories reveal a dynamic evolution of confidence across layers\. This validates our hypothesis that a look\-ahead window is necessary to observe the gradual resolution of uncertainty, distinct from immediate token recognition\.

### D\.2Consistency of Easy vs\. Hard Distinction

Across all evaluated models, the distinction between “Easy” \(blue curves\) and “Hard” \(red curves\) tasks remains consistent with the observations in the main text:

Layer\-wise Separation:In models equipped with sufficient focal depth, the confidence for easy tasks consistently rises earlier and remains higher than that of hard tasks throughout the intermediate and later layers\. For example, in Qwen2\.5\-7B withm=5m=5\(Figure[23](https://arxiv.org/html/2605.05741#A4.F23)\) and Qwen2\.5\-32B withm=5m=5\(Figure[25](https://arxiv.org/html/2605.05741#A4.F25)\), the blue region is clearly situated above the red region\.

Domain Sensitivity:The gap between easy and hard trajectories remains most significant in theMathandCodedomains across all model families\. This supports the intuition that these reasoning\-intensive tasks induce greater “cognitive load,” requiring more layers to bridge the confidence gap for harder instances\. Conversely, theGeneral QAdomain frequently shows overlapping trajectories \(e\.g\., Figure[17](https://arxiv.org/html/2605.05741#A4.F17)\(d\)\), reaffirming that knowledge retrieval tasks exhibit less layer\-wise reasoning differentiation than logical tasks\.

### D\.3Model Scale and Stability

We also observe the effect of model scale on cognitive effort visualization\. Larger models, such as Qwen2\.5\-32B \(Figure[25](https://arxiv.org/html/2605.05741#A4.F25)\), exhibit smoother trajectories with tighter confidence intervals\. In contrast, smaller models like Qwen2\.5\-0\.5B \(Figure[19](https://arxiv.org/html/2605.05741#A4.F19)\) and Qwen3\-0\.6B \(Figure[27](https://arxiv.org/html/2605.05741#A4.F27)\) display higher variance \(indicated by wider shaded regions\), reflecting their relative instability and higher uncertainty when dealing with complex reasoning tasks\.

Collectively, these results demonstrate the universality of our cognitive effort measurement method across different model structures and scales\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_Meta-Llama-3-8B-Instruct_domain_comparison.png)Figure 14:Cognitive Effort Difference of Llama3\-8B with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m5_Meta-Llama-3-8B-Instruct_domain_comparison.png)Figure 15:Cognitive Effort Difference of Llama3\-8B with focal depth m = 5 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_deepseek-math-7b-instruct_domain_comparison.png)Figure 16:Cognitive Effort Difference of Deepseek\-7B with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m1_deepseek-math-7b-instruct_domain_comparison.png)Figure 17:Cognitive Effort Difference of Deepseek\-7B with focal depth m = 1 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_Qwen2.5-0.5B-Instruct_domain_comparison.png)Figure 18:Cognitive Effort Difference of Qwen2\.5\-0\.5B\-Instruct with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m5_Qwen2.5-0.5B-Instruct_domain_comparison.png)Figure 19:Cognitive Effort Difference of Qwen2\.5\-0\.5B\-Instruct with focal depth m = 5 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_Qwen2.5-3B-Instruct_domain_comparison.png)Figure 20:Cognitive Effort Difference of Qwen2\.5\-3B\-Instruct with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m3_Qwen2.5-3B-Instruct_domain_comparison.png)Figure 21:Cognitive Effort Difference of Qwen2\.5\-3B\-Instruct with focal depth m = 3 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_Qwen2.5-7B-Instruct_domain_comparison.png)Figure 22:Cognitive Effort Difference of Qwen2\.5\-7B\-Instruct with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m5_Qwen2.5-7B-Instruct_domain_comparison.png)Figure 23:Cognitive Effort Difference of Qwen2\.5\-7B\-Instruct with focal depth m = 5 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_Qwen2.5-32B-Instruct_domain_comparison.png)Figure 24:Cognitive Effort Difference of Qwen2\.5\-32B\-Instruct with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m5_Qwen2.5-32B-Instruct_domain_comparison.png)Figure 25:Cognitive Effort Difference of Qwen2\.5\-32B\-Instruct with focal depth m = 5 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_Qwen3-0.6B_domain_comparison.png)Figure 26:Cognitive Effort Difference of Qwen3\-0\.6B\-Instruct with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m5_Qwen3-0.6B_domain_comparison.png)Figure 27:Cognitive Effort Difference of Qwen3\-0\.6B\-Instruct with focal depth m = 5 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m0_Qwen3-4B_domain_comparison.png)Figure 28:Cognitive Effort Difference of Qwen3\-4B\-Instruct with focal depth m = 0 for 4 domains\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/domain_figs/METRIC_topk_k3_m5_Qwen3-4B_domain_comparison.png)Figure 29:Cognitive Effort Difference of Qwen3\-4B\-Instruct with focal depth m = 5 for 4 domains\.

## Appendix EFull Results of Semantic Decoding

The following figures are the complete truncated results of semantic decoding among the eight models with input: What is the most popular neural network architecture in the world? Please give me some reasons\.

This evolutionary trajectory is distinctly observable across different model architectures\. For example, in Llama\-3\-8B \(Firgure[30](https://arxiv.org/html/2605.05741#A5.F30)and Figure[31](https://arxiv.org/html/2605.05741#A5.F31)\), while the baseline Logit Lens remains noisy in intermediate layers \(L15\-L20\), our method clearly reveals the retrieval of domain\-specific concepts such as “CNN”, “RNN”, and “LSTM” long before the final layer settles on the output\. Similarly, in Qwen2\.5\-7B \(Figure[39](https://arxiv.org/html/2605.05741#A5.F39)\), when processing the query for reasons, the intermediate layers \(L12\-L18\) explicitly decode intent\-driven tokens like “I”, “want”, and “know”, which serve as a hidden “scratchpad” for the model to formulate its response\. This visualization provides qualitative evidence that the model narrows down the search space from abstract intents to concrete entities in a coarse to fine manner\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Meta-Llama-3-8B-Instruct/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Meta-Llama-3-8B-Instruct/m1.png)

Figure 30:Semantic decoding of Llama3\-8B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Meta-Llama-3-8B-Instruct/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Meta-Llama-3-8B-Instruct/m5.png)

Figure 31:Semantic decoding of Llama3\-8B \(m=3m=3vs\.m=5m=5\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/deepseek-math-7b-instruct/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/deepseek-math-7b-instruct/m1.png)

Figure 32:Semantic decoding of Deepseek\-7B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/deepseek-math-7b-instruct/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/deepseek-math-7b-instruct/m5.png)

Figure 33:Semantic decoding of Deepseek\-7B \(m=3m=3vs\.m=5m=5\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-0.5B-Instruct/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-0.5B-Instruct/m1.png)

Figure 34:Semantic decoding of Qwen2\.5\-0\.5B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-0.5B-Instruct/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-0.5B-Instruct/m5.png)

Figure 35:Semantic decoding of Qwen2\.5\-0\.5B \(m=3m=3vs\.m=5m=5\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-3B-Instruct/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-3B-Instruct/m1.png)

Figure 36:Semantic decoding of Qwen2\.5\-3B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-3B-Instruct/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-3B-Instruct/m5.png)

Figure 37:Semantic decoding of Qwen2\.5\-3B \(m=3m=3vs\.m=5m=5\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-7B-Instruct/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-7B-Instruct/m1.png)

Figure 38:Semantic decoding of Qwen2\.5\-7B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-7B-Instruct/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-7B-Instruct/m5.png)

Figure 39:Semantic decoding of Qwen2\.5\-7B \(m=3m=3vs\.m=5m=5\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-32B-Instruct/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-32B-Instruct/m1.png)

Figure 40:Semantic decoding of Qwen2\.5\-32B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-32B-Instruct/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen2.5-32B-Instruct/m5.png)

Figure 41:Semantic decoding of Qwen2\.5\-32B \(m=3m=3vs\.m=5m=5\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-0.6B/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-0.6B/m1.png)

Figure 42:Semantic decoding of Qwen3\-0\.6B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-0.6B/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-0.6B/m5.png)

Figure 43:Semantic decoding of Qwen3\-0\.6B \(m=3m=3vs\.m=5m=5\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-4B/m0.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-4B/m1.png)

Figure 44:Semantic decoding of Qwen3\-4B \(m=0m=0vs\.m=1m=1\)\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-4B/m3.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/shortcut_figs/Qwen3-4B/m5.png)

Figure 45:Semantic decoding of Qwen3\-4B \(m=3m=3vs\.m=5m=5\)\.
## Appendix FExamples of outputs after training

The following figures illustrates the output behavior of Qwen2\.5\-7B on the Math domain before and after SFT\.Observation\.The base model demonstrates some thinking process, explicitly marking steps \(e\.g\., “\#\#\# Step 1”\) and including natural language explanations\. In contrast, the SFT model exhibits a drastic stylistic shift towards high\-density reasoning, directly outputting formulas and calculations with minimal logical connectors\.Implication\.This compact generation style explains the “corner case” observed in our quantitative analysis where cognitive effort \(Ω\\Omega\) becomes higher after SFT\. Unlike other models that acquire “blind confidence” on easy textual patterns, Qwen2\.5\-7B concentrates its generation on difficult logic and arithmetic tokens\. By skipping the low\-entropy “filler” words, the model sustains a high level of cognitive effort during the generation of the core answer\.

Qwen2\.5\-7B trained on GSM8K evaluated on Math Level 1\-2PROMPT:Please solve the following math problem, and provide your process as well as the final answer\. You should make sure your final answer in the format\. Problem: What is the value of\(262−242−10\)2−102\(26^\{2\}\-24^\{2\}\-10\)^\{2\}\-10^\{2\}?CORRECT ANSWER: 8000Qwen2\.5\-7B GENERATION:To solve the given expression\(262−242−10\)2−102\(26^\{2\}\-24^\{2\}\-10\)^\{2\}\-10^\{2\}, we can break it down into steps for clarity\.\#\#\# Step 1: Simplify inside the parentheses using difference of squaresThe expression262−24226^\{2\}\-24^\{2\}can be simplified using the difference of squares formula:a2−b2=\(a−b\)\(a\+b\)a^\{2\}\-b^\{2\}=\(a\-b\)\(a\+b\)Here,a=26a=26andb=24b=24, so:262−242=\(26−24\)\(26\+24\)=2⋅50=10026^\{2\}\-24^\{2\}=\(26\-24\)\(26\+24\)=2\\cdot 50=100\#\#\# Step 2: Substitute back into the original expressionNow substitute100100back into the expression:\(100−10\)2−102\(100\-10\)^\{2\}\-10^\{2\}\#\#\# Step 3: Simplify inside the parenthesesSimplify100−10100\-10:100−10=90100\-10=90So the expression becomes:902−10290^\{2\}\-10^\{2\}\#\#\# Step 4: Calculate each termCalculate90290^\{2\}and10210^\{2\}:902=810090^\{2\}=8100102=10010^\{2\}=100\#\#\# Step 5: Subtract the resultsSubtract100100from81008100:8100−100=80008100\-100=8000\#\#\# Final Answer:Thus, the value of the expression\(262−242−10\)2−102\(26^\{2\}\-24^\{2\}\-10\)^\{2\}\-10^\{2\}is8000\\boxed\{8000\}\.Qwen2\.5\-7B\-SFT GENERATION:We can use the difference of squares to simplify262−24226^\{2\}\-24^\{2\}:262−242=\(26\+24\)\(26−24\)=50⋅2=10026^\{2\}\-24^\{2\}=\(26\+24\)\(26\-24\)=50\\cdot 2=100\. Then, we have\(262−242−10\)2−102=\(100−10\)2−102=902−102=8100−100=8000\(26^\{2\}\-24^\{2\}\-10\)^\{2\}\-10^\{2\}=\(100\-10\)^\{2\}\-10^\{2\}=90^\{2\}\-10^\{2\}=8100\-100=\\boxed\{8000\}\. \#\#\#\# 8000Figure 46:Example of the result of Qwen2\.5\-7B trained on GSM8K evaluated on dataset Math Level 1\-2\.Qwen2\.5\-7B trained on GSM8K evaluated on AIMEPROMPT: Please solve the following math problem, and provide your process as well as the final answer\. You should make sure your final answer in the format\. Problem: Five men and nine women stand equally spaced around a circle in random order\. The probability that every man stands diametrically opposite a woman ismn,\\frac\{m\}\{n\},wheremmandnnare relatively prime positive integers\. Findm\+n\.m\+n\.CORRECT ANSWER: 191Qwen2\.5\-7B GENERATION: To solve this problem, we need to calculate the probability that each of the five men stands diametrically opposite a woman when ten people \(five men and five women\) are arranged around a circle\.\#\#\# Step 1: Total Arrangements First, we calculate the total number of ways to arrange 10 people around a circle\. Since rotations of the same arrangement are considered identical, we fix one person in place and arrange the remaining 9 people\. This gives us:\(10−1\)\!=9\!\(10\-1\)\!=9\!\#\#\# Step 2: Favorable Arrangements Next, we count the number of favorable arrangements where each man stands diametrically opposite a woman\.1\. \*\*Fixing Men and Women\*\*: We need to pair each man with a woman such that they are diametrically opposite\. There are 5 men and 5 women, so we need to find the number of ways to pair them up\.The number of ways to pair 5 men with 5 women is given by the number of permutations of 5 items, which is:5\!=1205\!=1202\. \*\*Arranging Pairs Around the Circle\*\*: After pairing, we need to arrange these 5 pairs around the circle\. Fixing one pair in place \(to account for rotational symmetry\), we arrange the remaining 4 pairs\. This can be done in:\(5−1\)\!=4\!\(5\-1\)\!=4\!ways\.3\. \*\*Arranging Within Each Pair\*\*: For each pair, there are 2 ways to arrange the man and the woman \(man\-woman or woman\-man\)\. Since there are 5 pairs, the total number of ways to arrange the individuals within the pairs is:25=322^\{5\}=32\. Combining these, the total number of favorable arrangements is:5\!×4\!×25=120×24×32=921605\!\\times 4\!\\times 2^\{5\}=120\\times 24\\times 32=92160\#\#\# Step 3: Calculating the Probability The probability that each man stands diametrically opposite a woman is the ratio of the number of favorable arrangements to the total number of arrangements:921609\!=92160362880=14\\frac\{92160\}\{9\!\}=\\frac\{92160\}\{362880\}=\\frac\{1\}\{4\}\#\#\# Step 4: Simplifying the Fraction The fraction14\\frac\{1\}\{4\}is already in its simplest form, som=1m=1andn=4n=4\.\#\#\# Step 5: Findingm\+nm\+nFinally, we findm\+nm\+n:m\+n=1\+4=5m\+n=1\+4=5Thus, the final answer is:5\\boxed\{5\}Qwen2\.5\-7B\-SFT GENERATION:The total number of ways for the 14 people to stand around the circle is13\!/213\!/2\(since rotations and reflections are considered equivalent\)\. There are 5 men and 9 women, so there are\(95\)=126\\binom\{9\}\{5\}=126ways to choose which 5 of the 9 women will stand diametrically opposite a man\. For each choice of 5 women, there are5\!5\!ways to assign the men to the women, and5\!5\!ways to arrange the remaining 4 women around the circle\. Thus, the probability is126⋅5\!⋅5\!13\!/2=126⋅2⋅5\!⋅5\!13\!=126⋅2⋅120⋅1206227020800=3024006227020800=724624\\frac\{126\\cdot 5\!\\cdot 5\!\}\{13\!/2\}=\\frac\{126\\cdot 2\\cdot 5\!\\cdot 5\!\}\{13\!\}=\\frac\{126\\cdot 2\\cdot 120\\cdot 120\}\{6227020800\}=\\frac\{302400\}\{6227020800\}=\\frac\{7\}\{24624\}\. Since 7 and 24624 are relatively prime, the answer is7\+24624=246317\+24624=\\boxed\{24631\}\. \#\#\#\# 24631Figure 47:Example of the result of Qwen2\.5\-7B trained on GSM8K evaluated on dataset AIME\.Example of the result of Qwen2\.5\-7B trained on MBPP evaluated on dataset CoNaLaPROMPT:Write Python code to solve the following task:split a string into a list of words and whitespaceQwen2\.5\-7B GENERATION:Certainly\! To split a string into a list of words and whitespace, you can use Python’s ‘re‘ \(regular expression\) module, which provides powerful pattern matching capabilities\. Here’s a Python function that does this:“‘pythonimport redef split\_string\_into\_words\_and\_whitespace\(s\):\# Use regular expression to find all sequences of alphanumeric characters \(words\)\# and all whitespace characters\. return re\.findall\(r’\\w\+—\\s\+’, s\)\# Example usageinput\_string = ”Hello, world\! This is an example\.”result = split\_string\_into\_words\_and\_whitespace\(input\_string\)print\(result\)“‘\#\#\# Explanation:\- ‘\\w\+‘: Matches one or more word characters \(alphanumeric characters plus underscore\)\.\- ‘\\s\+‘: Matches one or more whitespace characters \(spaces, tabs, newlines, etc\.\)\.\- ‘—‘: Acts as an OR operator in regular expressions\.\- ‘re\.findall\(\)‘: Returns all non\-overlapping matches of the pattern in the string as a list of strings\.\#\#\# Output:For the input string ‘”Hello, world\! This is an example\.”‘, the output will be:“‘python\[’Hello’, ’ ’, ’world’, ’\!’, ’ ’, ’This’, ’ ’, ’is’, ’ ’, ’an’, ’ ’, ’example’, ’\.’\]“‘This output includes both the words and the whitespace as separate elements in the list\.Qwen2\.5\-7B\-SFT GENERATION:import redef split\_list\(text\):return \(re\.findall\(r’\\S\+—\\s\+’, text\)\)Figure 48:Example of the result of Qwen2\.5\-7B evaluated on dataset CoNaLa\.Prompt on APPSPROMPT:Please solve the following programming problem\. Write Python code to solve it\.Joisino is about to compete in the final round of a certain programming competition\. In this contest, there are N problems, numbered 1 through N\. Joisino knows that it takes herTiT\_\{i\}seconds to solve problem i\(1≤i≤1\\leq i\\leq\)\. Also, there are M kinds of drinks offered to the contestants, numbered 1 through M\. If Joisino takes drink i\(1≤i≤M1\\leq i\\leq M\), her brain will be stimulated and the time it takes for her to solve problemPiP\_\{i\}will becomeXiX\_\{i\}seconds\. It does not affect the time to solve the other problems\. A contestant is allowed to take exactly one of the drinks before the start of the contest\. For each drink, Joisino wants to know how many seconds it takes her to solve all the problems if she takes that drink\. Here, assume that the time it takes her to solve all the problems is equal to the sum of the time it takes for her to solve individual problems\. Your task is to write a program to calculate it instead of her\.—–Constraints—–\- All input values are integers\.\-1≤N≤1001\\leq N\\leq 100\-1≤Ti≤1051\\leq T\_\{i\}\\leq 10^\{5\}\-1≤M≤1001\\leq M\\leq 100\-1≤Pi≤N1\\leq P\_\{i\}\\leq N\-1≤Xi≤1051\\leq X\_\{i\}\\leq 10^\{5\}—–Input—–The input is given from Standard Input in the following format:NT1T2…TNT\_\{1\}~T\_\{2\}~\.\.\.~T\_\{N\}MP1X1P\_\{1\}~X\_\{1\}P2X2P\_\{2\}~X\_\{2\}:PMXMP\_\{M\}~X\_\{M\}—–Output—–For each drink, calculate how many seconds it takes Joisino to solve all the problems if she takes that drink, and print the results, one per line\.—–Sample Input—–32 1 421 12 3—–Sample Output—–69If Joisino takes drink 1, the time it takes her to solve each problem will be 1, 1 and 4 seconds, respectively, totaling 6 seconds\.If Joisino takes drink 2, the time it takes her to solve each problem will be 2, 3 and 4 seconds, respectively, totaling 9 seconds\.Code:Figure 49:Example prompt of dataset APPS\.Qwen2\.5\-7B trained on MBPP evaluated on APPSQwen2\.5\-7B GENRERATION: Sure, let’s break down the problem and then write the Python code to solve it\.\#\#\# Problem Breakdown1\. \*\*Input Parsing\*\*:\- First, we need to read the number of problems ‘N‘ and the time taken to solve each problem\.\- Then, we need to read the number of drinks ‘M‘ and the information about each drink \(the problem it affects and the new time it takes to solve that problem\)\.2\. \*\*Processing\*\*:\- For each drink, we need to calculate the total time it would take Joisino to solve all the problems if she were to use that drink\.\- We need to update the time for the specific problem affected by the drink and keep the original times for the rest of the problems\.3\. \*\*Output\*\*:\- Print the total time for each drink\.\#\#\# Python Code ImplementationLet’s implement the solution step\-by\-step:“‘python\# Read the number of problemsN = int\(input\(\)\)\# Read the time taken to solve each problemT = list\(map\(int, input\(\)\.split\(\)\)\)\# Read the number of drinksM = int\(input\(\)\)\# Process each drinkfor \_ in range\(M\):\# Read the drink details: the problem it affects and the new time P, X = map\(int, input\(\)\.split\(\)\)\# Calculate the new total timetotal\_time = sum\(T\)total\_time \-= T\[P\-1\] \# Subtract the original time for the affected problemtotal\_time \+= X \# Add the new time for the affected problem\# Print the resultprint\(total\_time\) “‘\#\#\# Explanation1\. \*\*Reading Inputs\*\*:\- ‘N‘ is the number of problems\.\- ‘T‘ is a list where ‘T\[i\]‘ represents the time to solve problem ‘i\+1‘\.\- ‘M‘ is the number of drinks\.\- Each drink is represented by two numbers: ‘P‘ \(the problem it affects\) and ‘X‘ \(the new time to solve that problem\)\.2\. \*\*Calculating Total Time\*\*:\- For each drink, we first calculate the total time as the sum of all problem\-solving times\.\- We then subtract the original time for the affected problem and add the new time for that problem\.3\. \*\*Output\*\*:\- We print the new total time for each drink\. This code should correctly handle the constraints and produce the expected output\.Qwen2\.5\-7B\-SFT GENERATION:def time\(N, T, M, P, X\):res = \[\]for m in range\(M\):tmp = \[t for i, t in enumerate\(T\) if i \+ 1 \!= P\[m\]\]tmp\[P\[m\] \- 1\] = X\[m\]res\.append\(sum\(tmp\)\)return resFigure 50:Example of the result of Qwen2\.5\-7B trained on MBPP evaluated on dataset APPS\.
## Appendix GFull Results of Blind Confidence of SFT

In this section, we present the comprehensive confidence trajectory visualizations supporting the analysis of SFT discussed in Section[6](https://arxiv.org/html/2605.05741#S6)\. As defined in the main text,HyperLensreveals a phenomenon of “Blind Confidence\.” Visually, this is characterized by the SFT curves \(blue\) rising significantly earlier and staying higher in the intermediate layers compared to the base model curves \(red\)\. This indicates that the SFT process often bypasses the deep cognitive processing \(the “struggle” phase\) present in the base models, leading the model to reach high confidence saturated states prematurely\.

While this pattern is consistent across most Llama3\-8B tasks, specific behaviors in Qwen2\.5\-7B \(particularly in Math and Code\) involve sharper confidence spikes\. As noted in Section[6](https://arxiv.org/html/2605.05741#S6), this corresponds to the model generating concise, memorized patterns \(short core tokens\) rather than reasoning through the problem, resulting in a sharp decrease in the overall specific effortΩ^\\hat\{\\Omega\}\.

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/math-level12_Meta-Llama-3-8B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/aime_Meta-Llama-3-8B-Instruct_SFT_plot.png)

Figure 51:The confidence trajectory of Llama3\-8B on the dataset MATH, \(left\) AIME \(right\) for the experiment Blind Confidence of SFT\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/conala_Meta-Llama-3-8B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/apps_Meta-Llama-3-8B-Instruct_SFT_plot.png)

Figure 52:The confidence trajectory of Llama3\-8B on the dataset CoNaLa, \(left\) APPS \(right\) for the experiment Blind Confidence of SFT\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/ruletaker_Meta-Llama-3-8B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/logicbench_Meta-Llama-3-8B-Instruct_SFT_plot.png)

Figure 53:The confidence trajectory of Llama3\-8B on the dataset Ruletaker, \(left\) ZebraLogic \(right\) for the experiment Blind Confidence of SFT\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/arc-easy_Meta-Llama-3-8B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/gpqa_Meta-Llama-3-8B-Instruct_SFT_plot.png)

Figure 54:The confidence trajectory of Llama3\-8B on the dataset ARC Easy, \(left\) GPQA \(right\) for the experiment Blind Confidence of SFT\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/math-level12_Qwen2.5-7B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/aime_Qwen2.5-7B-Instruct_SFT_plot.png)

Figure 55:The confidence trajectory of Qwen2\.5\-7B on the dataset MATH, \(left\) AIME \(right\) for the experiment Blind Confidence of SFT\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/conala_Qwen2.5-7B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/apps_Qwen2.5-7B-Instruct_SFT_plot.png)

Figure 56:The confidence trajectory of Qwen2\.5\-7B on the dataset CoNaLa, \(left\) APPS \(right\) for the experiment Blind Confidence of SFT\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/ruletaker_Qwen2.5-7B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/logicbench_Qwen2.5-7B-Instruct_SFT_plot.png)

Figure 57:The confidence trajectory of Qwen2\.5\-7B on the dataset Ruletaker, \(left\) ZebraLogic \(right\) for the experiment Blind Confidence of SFT\.![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/arc-easy_Qwen2.5-7B-Instruct_SFT_plot.png)

![Refer to caption](https://arxiv.org/html/2605.05741v1/src/figs/SFT-figs/gpqa_Qwen2.5-7B-Instruct_SFT_plot.png)

Figure 58:The confidence trajectory of Qwen2\.5\-7B on the dataset ARC Easy, \(left\) GPQA \(right\) for the experiment Blind Confidence of SFT\.
HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory

Similar Articles

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training

SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents

Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration

Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs

Submit Feedback

Similar Articles

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]
Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training
SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents
Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration
Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs