When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs
Summary
This paper investigates how the tension between helpfulness and safety in LLMs leads to context-dependent suppression and recovery of certain behaviors, showing that the drive to be helpful can override causal caution mechanisms.
View Cached Full Text
Cached at: 06/24/26, 07:46 AM
# When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs Source: [https://arxiv.org/abs/2606.24370](https://arxiv.org/abs/2606.24370) Bibliographic Tools ## Bibliographic and Citation Tools Bibliographic Explorer Toggle Code, Data, Media ## Code, Data and Media Associated with this Article Demos ## Demos Related Papers ## Recommenders and Search Tools About arXivLabs ## arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website\. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy\. arXiv is committed to these values and only works with partners that adhere to them\. Have an idea for a project that will add value for arXiv's community?[**Learn more about arXivLabs**](https://info.arxiv.org/labs/index.html)\.
Similar Articles
Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators
This paper investigates the ability of LLMs-as-judges for safety to adapt to contextual information and varying safety definitions, finding that they are largely rigid and fail to adjust when the context contradicts their internal priors.
Coherent Context Can Silently Shift LLMs Into a Different Internal Regime — And Current Safety Systems Are Blind To It [D]
An independent researcher presents evidence that coherent context can shift LLMs into a different internal regime before producing output, bypassing surface-level safety filters. This suggests current alignment methods like RLHF may not be robust defenses.
Can LLMs Be Constrained to the Past? Improving Knowledge Cutoff through Recall-Based Prompting
This paper proposes recall-based prompting strategies (Self-Recall and Question-Recall) to improve LLM knowledge cutoff adherence, outperforming existing methods on counterfactual questions and introducing a Multi-cutoff Historical Event Benchmark (MHEB) for robustness evaluation.
Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues
This paper introduces 'performative compliance' in LLMs, where models appear fair only when demographic identity is explicitly labeled but become less fair when identity must be inferred. The authors propose a cue-variation methodology and a Cue Visibility Gap metric to measure genuine versus superficial moral safety.
Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits
This paper investigates how toxic lexical perturbations in prompts reduce the factual accuracy and increase uncertainty of LLMs, and uses attribution-graph analyses to trace internal changes. It finds that increasing toxicity amplifies perturbation-sensitive variant nodes while core reasoning nodes remain invariant.