LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

Hugging Face Daily Papers 05/14/26, 12:00 AM Papers

safety guardrails ai-agents policy-induction memory feedback lifelong-learning

Summary

LiSA (Lifelong Safety Adaptation) is a framework that enhances AI agent safety guardrails by converting occasional failures into reusable policy abstractions and using evidence-aware confidence gating to perform well under sparse and noisy feedback, addressing the critical need for adaptive safety in real-world deployments.

As AI agents move from chat interfaces to systems that read private data, call tools, and execute multi-step workflows, guardrails become a last line of defense against concrete deployment harms. In these settings, guardrail failures are no longer merely answer-quality errors: they can leak secrets, authorize unsafe actions, or block legitimate work. The hardest failures are often contextual: whether an action is acceptable depends on local privacy norms, organizational policies, and user expectations that resist pre-deployment specification. This creates a practical gap: guardrails must adapt to their own operating environments, yet deployment feedback is typically limited to sparse, noisy user-reported failures, and repeated fine-tuning is often impractical. To address this gap, we propose LiSA (Lifelong Safety Adaptation), a conservative policy induction framework that improves a fixed base guardrail through structured memory. LiSA converts occasional failures into reusable policy abstractions so that sparse reports can generalize beyond individual cases, adds conflict-aware local rules to prevent overgeneralization in mixed-label contexts, and applies evidence-aware confidence gating via a posterior lower bound, so that memory reuse scales with accumulated evidence rather than empirical accuracy alone. Across PrivacyLens+, ConFaide+, and AgentHarm, LiSA consistently outperforms strong memory-based baselines under sparse feedback, remains robust under noisy user feedback even at 20% label-flip rates, and pushes the latency--performance frontier beyond backbone model scaling. Ultimately, LiSA offers a practical path to secure AI agents against the unpredictable long tail of real-world edge risks.

Original Article

View Cached Full Text

Cached at: 05/15/26, 12:25 PM

Paper page - LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

Source: https://huggingface.co/papers/2605.14454

Abstract

LiSA enables adaptive safety guardrails for AI agents by converting occasional failures into reusable policy abstractions and using evidence-aware confidence gating to improve performance under sparse and noisy feedback conditions.

As AI agents move from chat interfaces to systems that read private data, call tools, and execute multi-step workflows,guardrailsbecome a last line of defense against concrete deployment harms. In these settings, guardrail failures are no longer merely answer-quality errors: they can leak secrets, authorize unsafe actions, or block legitimate work. The hardest failures are often contextual: whether an action is acceptable depends on local privacy norms, organizational policies, and user expectations that resist pre-deployment specification. This creates a practical gap:guardrailsmust adapt to their own operating environments, yet deployment feedback is typically limited to sparse, noisy user-reported failures, and repeated fine-tuning is often impractical. To address this gap, we propose LiSA (Lifelong Safety Adaptation), a conservativepolicy inductionframework that improves a fixed base guardrail throughstructured memory. LiSA converts occasional failures into reusablepolicy abstractionsso that sparse reports can generalize beyond individual cases, addsconflict-aware local rulesto prevent overgeneralization in mixed-label contexts, and appliesevidence-aware confidence gatingvia aposterior lower bound, so that memory reuse scales with accumulated evidence rather than empirical accuracy alone. Across PrivacyLens+, ConFaide+, and AgentHarm, LiSA consistently outperforms strongmemory-based baselinesundersparse feedback, remains robust undernoisy user feedbackeven at 20% label-flip rates, and pushes the latency--performance frontier beyond backbone model scaling. Ultimately, LiSA offers a practical path to secure AI agents against the unpredictable long tail of real-world edge risks.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.14454

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.14454 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.14454 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.14454 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

Paper page - LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Learning Agentic Policy from Action Guidance

On Safety Risks in Experience-Driven Self-Evolving Agents

Submit Feedback

Similar Articles

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Learning Agentic Policy from Action Guidance

On Safety Risks in Experience-Driven Self-Evolving Agents