hidden-representations

#hidden-representations

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

This paper introduces a method for monitoring the reasoning process of Large Reasoning Models by analyzing probe trajectories—the evolution of a concept's probability across generated tokens. The approach uses temporal and signal-processing features from hidden representations to better predict future model behavior, achieving up to 95% AUROC with max-pooling.

0 favorites 0 likes

#hidden-representations

Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper demonstrates that a small post-transformer adapter (786K parameters) can correct suppressed log-probabilities in alignment-tuned language models, particularly on politically sensitive topics. The adapter shows 31-39% generalization to held-out facts across Qwen3 models while maintaining coherent generation when applied at the final prediction position.

0 favorites 0 likes

hidden-representations

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters

Submit Feedback