Tag
This paper dissociates difficulty registration from deliberation allocation in large reasoning models (LRMs) and humans, finding that LRMs spend more tokens on problems they get wrong while humans spend less time on failures, revealing opposite within-item patterns despite similar cross-item difficulty correlations.
A practitioner discusses the calibration vs. utility tradeoff in LLM agents, sharing experience with a verifier-based pipeline that reduces hallucinated tool calls by ~60% but introduces latency costs and drops easy correct answers.
This research presents probe-targeted fine-tuning (LoRA) to make LLMs verbally express their internal confidence, achieving causal control over confidence outputs and demonstrating that models often know when they are right or wrong but fail to articulate it.
This paper argues that recent claims about LLMs' ability to introspect are not justified, as behavioral evidence alone cannot distinguish genuine introspection from pattern matching on surface-level cues. The authors re-examine two evaluation paradigms and find that models rely on input-level features rather than genuine access to internal states.
This paper investigates whether frontier LLMs exhibit individuated metacognition—the ability to assess their own item-level capabilities beyond shared signals. Through factor analysis and pairwise calibration across 20 models and six benchmarks, the authors find no evidence of such metacognition; confidence differences reduce to a single shared difficulty factor, suggesting models rely on a common difficulty signal rather than model-specific self-knowledge.
A new Google paper argues that LLMs should focus on expressing uncertainty honestly rather than aiming for perfect factuality, proposing 'faithful uncertainty' to build trust.
Introduces Metacognition-as-Reward (MaR), a reinforcement learning framework that guides LLM reasoning via metacognitive knowledge and regulation signals, achieving up to 11% improvement over vanilla methods on reasoning benchmarks.
This essay argues that evaluation is the hardest problem in production AI, not generation, and decomposes AI self-knowledge into calibration, discrimination, and expression, with implications for system design.
This position paper argues that incorporating metacognition as a design principle can lead to more accurate, secure, and efficient AI systems, and demonstrates the concept through a Federated Learning case study and a software framework for experimentation.
This paper proposes a metacognitive harness that separates monitoring from reasoning in LLMs, using pre-solve feeling-of-knowing and post-solve judgment-of-learning signals to control when to trust, retry, or aggregate answers, improving accuracy on text, code, and multimodal benchmarks without parameter updates.
Introduces TRIAGE, a framework for evaluating LLMs' prospective metacognitive control under token budgets, finding substantial gaps in their ability to allocate compute efficiently across problems.
This research paper investigates functional metacognition in Large Language Models, demonstrating that internal states like evaluation awareness and self-assessed capability are linearly decodable from residual stream activations. The authors propose a mechanistic framework to steer these states, showing causal control over reasoning behaviors, verbosity, and safety responses.
This study presents a 33-model atlas analyzing domain-level metacognitive monitoring in frontier LLMs using MMLU benchmarks, revealing significant variations in confidence calibration across different knowledge domains that are obscured by aggregate metrics.
This paper presents WriteFlow, an AI voice-based writing assistant designed to support reflective academic writing through goal-oriented interaction, addressing limitations of efficiency-focused writing tools by scaffolding metacognitive regulation and goal articulation. Findings from a Wizard-of-Oz study with 12 expert users demonstrate that the system effectively supports iterative goal refinement and goal-text alignment during the drafting process.
A new cross-domain benchmark (Metacognitive Monitoring Battery) with 524 items evaluates LLM self-monitoring capabilities across six cognitive domains using human psychometric methodology. Applied to 20 frontier LLMs, it reveals three distinct metacognitive profiles and shows that accuracy rank and metacognitive sensitivity rank are largely inverted.