Tag
The author critiques the idea of agents remembering everything and introduces TrueMemory, a system that converts memories into trait claims with confidence and evidence to better calibrate agent behavior.
A developer building autonomous billing agents discusses the difficulty of reconstructing why an agent made a decision after the fact, and describes building a tool (Attova) that records decisions with evidence, alternatives, and confidence to improve debugging and human review.
This paper investigates whether frontier LLMs exhibit individuated metacognition—the ability to assess their own item-level capabilities beyond shared signals. Through factor analysis and pairwise calibration across 20 models and six benchmarks, the authors find no evidence of such metacognition; confidence differences reduce to a single shared difficulty factor, suggesting models rely on a common difficulty signal rather than model-specific self-knowledge.
A reflection on how many AI models prioritize sounding confident over being truthful, using Claude as an example of a model that seems more focused on internal consistency and logical honesty.
Proposes CSR, a framework that calibrates LLMs directly in semantic space using a novel semantic calibration reward, reducing ECE by up to 40% and improving AUROC by up to 31% over verbalized-confidence baselines across multiple datasets.
Armin Ronacher (@mitsuhiko) suggests that people should be upfront about their actual understanding of a topic when making pull requests, as AI tools (referred to as 'clanker') make it easy to sound confident without real knowledge.
The article discusses a banquet where second-generation rich were seated next to Musk and Jensen Huang but lacked interaction, contrasting with the confidence of first-generation entrepreneurs like Jack Ma and Charles Zhang, sparking discussion on the differences between the two generations of entrepreneurs.
This paper identifies that on-policy distillation (OPD) in language models leads to severe overconfidence due to information mismatch between training and deployment, and proposes CaOPD, a calibration-aware framework that improves both performance and confidence reliability.