CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
Summary
This paper introduces CASCADE, a framework for deployment-time learning that allows Large Language Models to adapt continuously through episodic memory and contextual bandit optimization without modifying model parameters.
View Cached Full Text
Cached at: 05/11/26, 02:43 AM
Paper page - CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
Source: https://huggingface.co/papers/2605.06702
Abstract
Deployment-time learning enables large language model agents to adapt continuously during operation through episodic memory and contextual bandit optimization, improving performance across diverse tasks.
Large language models(LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalisedeployment-time learning(DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-basedContinual Adaptationduring DEployment), a general and principled framework that equips LLM agents with an explicit, evolvingepisodic memory. CASCADE formulatesexperience reuseas acontextual banditproblem, enabling principled exploration-exploitation trade-offs and establishingno-regret guaranteesover long-term interactions. This design allows agents to accumulate, select, and refinetask-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improvesmacro-averaged success rateby 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.
View arXiv pageView PDFGitHub1Add to collection
Get this paper in your agent:
hf papers read 2605\.06702
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.06702 in a model README.md to link it from this page.
Datasets citing this paper1
#### guosy/DTLBench Viewer• Updatedabout 1 hour ago • 32.7k • 103
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.06702 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Attribution-Guided Continual Learning for Large Language Models
This paper proposes an attribution-guided continual fine-tuning framework for large language models that estimates task-specific parameter importance in Transformer layers and modulates gradients accordingly, mitigating catastrophic forgetting while maintaining performance on new tasks.
ReAD: Reinforcement-Guided Capability Distillation for Large Language Models
This paper introduces ReAD, a reinforcement-guided capability distillation framework that optimizes token budgets by accounting for cross-capability transfer in large language models. It demonstrates improved downstream utility and reduced harmful spillover compared to existing baselines.
Self-Consolidating Language Models: Continual Knowledge Incorporation from Context
The paper introduces Self-Consolidating Language Models (SCoL), a framework that uses meta-reinforcement learning to write current context into model weights for continual knowledge incorporation. It demonstrates improved acquisition and retention over baselines in both QA and long-context consolidation tasks.
Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]
This paper introduces a Fast-Slow Training framework for LLMs that combines parameter updates with optimized context to improve sample efficiency and reduce catastrophic forgetting during continual learning.
JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models
JumpLoRA introduces a novel sparse adapter framework for continual learning in LLMs using JumpReLU gating to dynamically isolate task parameters and prevent catastrophic forgetting. The method enhances LoRA-based approaches and outperforms state-of-the-art continual learning methods like ELLA.