When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Hugging Face Daily Papers Papers

Summary

This paper introduces Contextual Belief Management (CBM) for LLMs to handle long-term information, proposes the BeliefTrack benchmark for evaluation, and demonstrates that reinforcement learning and representation-level steering significantly reduce belief management failures.

Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge as Contextual Belief Management (CBM): maintaining a predicted belief state aligned with formal evidence while isolating task-irrelevant noise. To make CBM measurable, we introduce BeliefTrack, a closed-world benchmark spanning Rule Discovery and Circuit Diagnosis, where a finite belief space and symbolic verifiers enable exact turn-level evaluation. BeliefTrack diagnoses three failures: Failed Stay, Failed Update, and Failed Isolation. Across multiple LLMs, vanilla models exhibit severe CBM failures, while explicit belief-tracking prompts provide limited gains. In contrast, reinforcement learning with belief-state rewards reduces failure rates by 70.9\% on average. Further probing reveals latent belief-state dynamics behind these failures, and representation-level steering reduces failure rates by 46.1\% across two tasks\footnote{Code is coming soon at https://github.com/zjunlp/CBM.
Original Article
View Cached Full Text

Cached at: 05/29/26, 02:59 AM

Paper page - When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Source: https://huggingface.co/papers/2605.30219

Abstract

Language models struggle with managing long-term information through contextual belief management, which involves updating, preserving, and filtering relevant information, and can be improved using reinforcement learning and representation-level steering techniques.

Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge asContextual Belief Management(CBM): maintaining a predictedbelief statealigned with formal evidence while isolating task-irrelevant noise. To make CBM measurable, we introduce BeliefTrack, aclosed-world benchmarkspanningRule DiscoveryandCircuit Diagnosis, where a finite belief space andsymbolic verifiersenable exact turn-level evaluation. BeliefTrack diagnoses three failures: Failed Stay, Failed Update, and Failed Isolation. Across multiple LLMs, vanilla models exhibit severe CBM failures, while explicit belief-tracking prompts provide limited gains. In contrast,reinforcement learningwith belief-state rewards reduces failure rates by 70.9\% on average. Further probing reveals latent belief-state dynamics behind these failures, andrepresentation-level steeringreduces failure rates by 46.1\% across two tasks\footnote{Code is coming soon at https://github.com/zjunlp/CBM.

View arXiv pageView PDFAdd to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.30219 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.30219 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.30219 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Can LLMs Take Retrieved Information with a Grain of Salt?

arXiv cs.CL

This paper investigates how large language models adapt to the certainty of retrieved information, identifying systematic limitations in handling uncertainty. It proposes an interaction strategy that reduces obedience errors by 25% without modifying model weights.