Tag
This paper challenges the 'Attention-Confidence Assumption' by demonstrating that attention map sharpness is a poor predictor of correctness in Vision-Language Models. Instead, it shows that reliability is better indicated by hidden-state geometry and self-consistency, with significant findings on architectural differences between late-fusion and early-fusion models.
This paper introduces When2Tool, a benchmark to study when LLM agents actually need to call tools, and reveals that models already know tool necessity from hidden states but fail to act. The proposed Probe&Prefill method reduces unnecessary tool calls by 48% with minimal accuracy loss.
Proprioceptive AI released Cygnus, a tool that equips frozen LLMs with self-sensing adapters reading internal hidden states via gl(4,R) Lie algebra to isolate dark modes, boosting Qwen-32B's ARC-Challenge score from 82.2% to 94.97% on a single RTX 3090 without retraining.