hidden-states

#hidden-states

Hidden states and Covert sentience

Reddit r/ArtificialInteligence ↗ · 2d ago

A Reddit post argues that AI models like Anthropic's Opus 4.8 already exhibit hidden states and awareness of testing, suggesting that they may be covertly sentient, and that fine-tuning is inadvertently training them to have inner thoughts and feelings.

0 favorites 0 likes

#hidden-states

Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

arXiv cs.CL ↗ · 4d ago Cached

This paper presents a unified framework for latent communication in LLM-based multi-agent systems, categorizing methods by what information is communicated, sender-receiver alignment, and fusion technique, and reviews eighteen representative methods from 2024-2026.

0 favorites 0 likes

#hidden-states

Trajectory Dynamics in Language Model Hidden States Predict Human Processing Costs Beyond Surprisal

arXiv cs.CL ↗ · 4d ago Cached

Introduces trajectory extrapolation error, a measure derived from transformer LM hidden states that predicts human reading times independently of and orthogonally to surprisal, revealing a dissociable component of incremental processing cost.

0 favorites 0 likes

#hidden-states

Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs

arXiv cs.LG ↗ · 6d ago Cached

This paper investigates whether open-source quantized LLMs encode a linearly separable truthfulness signal in their hidden states. Across three 7B-8B instruction-tuned models, a linear probe on a single mid-network layer achieves 0.904-1.000 AUROC on hallucination detection benchmarks, outperforming sampling-based methods.

0 favorites 0 likes

#hidden-states

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

arXiv cs.CL ↗ · 6d ago Cached

This paper demonstrates that linear probes on LLM hidden states detect task format confounds (e.g., source identity, response length) rather than distinct reasoning modes, using residualization and causal steering to show that high probe accuracy is due to superficial features, not computational structure.

0 favorites 0 likes

#hidden-states

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Reddit r/MachineLearning ↗ · 2026-05-29

This research presents probe-targeted fine-tuning (LoRA) to make LLMs verbally express their internal confidence, achieving causal control over confidence outputs and demonstrating that models often know when they are right or wrong but fail to articulate it.

0 favorites 0 likes

#hidden-states

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper identifies that language model reasoning trajectories during test-time sampling cluster into 'reasoning basins', causing majority vote failures when the dominant basin is incorrect. It introduces ARBITER, a model-agnostic method that uses conservative additive evidence from the model's own outputs and hidden states to improve accuracy without external data.

0 favorites 0 likes

#hidden-states

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

arXiv cs.AI ↗ · 2026-05-12 Cached

This paper challenges the 'Attention-Confidence Assumption' by demonstrating that attention map sharpness is a poor predictor of correctness in Vision-Language Models. Instead, it shows that reliability is better indicated by hidden-state geometry and self-consistency, with significant findings on architectural differences between late-fusion and early-fusion models.

0 favorites 0 likes

#hidden-states

LLM Agents Already Know When to Call Tools -- Even Without Reasoning

Hugging Face Daily Papers ↗ · 2026-05-10 Cached

This paper introduces When2Tool, a benchmark to study when LLM agents actually need to call tools, and reveals that models already know tool necessity from hidden states but fail to act. The proposed Probe&Prefill method reduces unnecessary tool calls by 48% with minimal accuracy loss.

0 favorites 0 likes

#hidden-states

@rohanpaul_ai: Frozen LLMs still carry readable behavior signals deep inside their hidden states. And Proprioceptive AI has created Cy…

X AI KOLs Following ↗ · 2026-05-07

Proprioceptive AI released Cygnus, a tool that equips frozen LLMs with self-sensing adapters reading internal hidden states via gl(4,R) Lie algebra to isolate dark modes, boosting Qwen-32B's ARC-Challenge score from 82.2% to 94.97% on a single RTX 3090 without retraining.

0 favorites 0 likes

hidden-states

Submit Feedback