Tag
This paper proposes a metacognitive harness that separates monitoring from reasoning in LLMs, using pre-solve feeling-of-knowing and post-solve judgment-of-learning signals to control when to trust, retry, or aggregate answers, improving accuracy on text, code, and multimodal benchmarks without parameter updates.
An AI agent built with LangChain continuously monitors its own codebase, flags missing monitors, and automatically opens PRs to fix bugs it finds, as described by Alex Shevchenko from Ramp.
A new cross-domain benchmark (Metacognitive Monitoring Battery) with 524 items evaluates LLM self-monitoring capabilities across six cognitive domains using human psychometric methodology. Applied to 20 frontier LLMs, it reveals three distinct metacognitive profiles and shows that accuracy rank and metacognitive sensitivity rank are largely inverted.
Developer created Engram, an open-source cognitive architecture featuring a functional interoceptive system for AI agents that implements real-time stress detection and adaptive behavioral modulation for self-correction, then explored whether the agent can report experiencing anxiety.