Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding
Summary
This paper introduces Confident Decoding, a training-free decoding strategy that dynamically selects the most reliable intermediate layer in LLMs using entropy-guided search, mitigating the alignment tax and improving reasoning performance on benchmarks like GPQA-Diamond and Omni-MATH with negligible overhead.
View Cached Full Text
Cached at: 06/23/26, 09:41 AM
Paper page - Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding
Source: https://huggingface.co/papers/2606.21906 Authors:
,
,
,
,
,
,
,
,
,
Abstract
Autoregressive generation in large language models traditionally uses the final layer for token prediction, but a new decoding strategy dynamically selects more reliable intermediate layers based on entropy-guided search, improving reasoning performance with minimal computational overhead.
Autoregressive generationinlarge language models(LLMs) conventionally decodes from the final layer, assuming that deeper representations yield more reliablenext-token predictions. We revisit this assumption by revealing a recurringGuess-Refine-Perturb dynamic: early layers form coarse guesses, intermediate layers refine reasoning-relevant semantics, and final layers can perturb these refined predictions toward generic or alignment-preferred tokens. We introduceConfident Decoding, a training-free decoding strategy that dynamically selects the most reliable near-final layer throughentropy-guided conservative backward search. We further provide a theoretical formulation oflayer selectionas anoptimal stopping problem, showing that under boundedprojection noiseand dominant late-stagealignment perturbation, our search rule filters perturbation while bounding the loss relative to the oracle refinement layer. Experiments across dense and Mixture-of-Experts LLMs demonstrate consistent gains on challengingreasoning benchmarks, includingGPQA-Diamond,Omni-MATH, andHLE, with zero memory overhead and less than 2% latency increase. These results suggest dynamically bypassing final-layer perturbations can unlock stronger reasoning behavior from aligned LLMs.
View arXiv pageView PDFProject pageGitHub2Add to collection
Get this paper in your agent:
hf papers read 2606\.21906
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.21906 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.21906 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.21906 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding
This paper introduces MGAP, a training-free decoding method that reduces hallucinations in Multimodal Large Language Models by adaptively suppressing only the harmful parts of language priors while preserving the model's semantic manifold. The method outperforms prior baselines on POPE and CHAIR benchmarks.
Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs
This paper reveals that zeroth-order fine-tuning of LLMs is dominated by a single decoding layer, which can be identified by activation outliers, and fine-tuning only that layer matches or exceeds full-model fine-tuning with up to 4.52x speedup.
Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility
The paper introduces SPEED, a layer-asymmetric KV visibility policy that reduces long-context inference costs by processing prompt tokens only in lower layers during prefill while maintaining full-depth attention during decoding.
Confidence-Aware Alignment Makes Reasoning LLMs More Reliable
This paper introduces CASPO, a framework for aligning token-level confidence with step-wise logical correctness in large reasoning models using iterative Direct Preference Optimization. It also proposes Confidence-aware Thought (CaT) for dynamically pruning uncertain reasoning branches during inference to improve reliability and efficiency.
When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models
Researchers propose a training-free method called Suffix-Anchored Confidence Modulation to improve confidence-based decoding in diffusion language models by addressing issues with EOT tokens and premature decoding.