internal-states

#internal-states

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

arXiv cs.CL ↗ · 10h ago Cached

This paper presents a factorised study of probe-based uncertainty estimation in LLMs, showing that raw hidden states and attention features perform well in-domain but structured features are more robust under distribution shift, and provides pretrained probes as off-the-shelf baselines.

0 favorites 0 likes

#internal-states

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

This paper introduces POISE, a method for stable policy optimization in large reasoning models by estimating baselines using the model's own internal states, reducing computational overhead compared to PPO and GRPO.

0 favorites 0 likes

#internal-states

Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper challenges the assumption that LLMs can reliably distinguish between hallucinated and factual outputs through internal signals, arguing that internal states primarily reflect knowledge recall rather than truthfulness. The authors propose a taxonomy of hallucinations (associated vs. unassociated) and show that associated hallucinations exhibit hidden-state geometries overlapping with factual outputs, making standard detection methods ineffective.

0 favorites 0 likes

internal-states

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Do LLMs Really Know What They Don't Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness

Submit Feedback