model-calibration

#model-calibration

LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

arXiv cs.AI ↗ · 4d ago Cached

This paper explores Large Language Models' inability to recognize their knowledge limits on structured clinical data, proposing a cross-model attribution divergence method to detect epistemic blind spots. The approach improves calibration and accuracy without training by combining few-shot examples and SHAP-derived feature evidence.

0 favorites 0 likes

#model-calibration

When the Next Step Is Not One Step: Distribution-Aware Execution Modeling for Concurrent Go Programs

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper proposes a distribution-aware training approach for modeling next-event predictions in concurrent Go programs, treating scheduler nondeterminism as a signal. Fine-tuning a 7B model on fewer than a thousand traces achieves 36.2% accuracy on production bugs, outperforming Gemini 3.5 Flash zero-shot.

0 favorites 0 likes

#model-calibration

When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces Adaptive Tool Trust Calibration (ATTC), a framework that improves tool-integrated reasoning models by enabling them to adaptively decide when to trust or ignore tool results based on code confidence scores. The approach addresses the "Tool Ignored" problem where models incorrectly dismiss correct tool outputs, achieving 4.1-7.5% performance improvements across multiple models and datasets.

0 favorites 0 likes

model-calibration

LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

When the Next Step Is Not One Step: Distribution-Aware Execution Modeling for Concurrent Go Programs

When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning

Submit Feedback