CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

Hugging Face Daily Papers Papers

Summary

This paper introduces CASCADE, a framework for deployment-time learning that allows Large Language Models to adapt continuously through episodic memory and contextual bandit optimization without modifying model parameters.

Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/11/26, 02:43 AM

Paper page - CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

Source: https://huggingface.co/papers/2605.06702

Abstract

Deployment-time learning enables large language model agents to adapt continuously during operation through episodic memory and contextual bandit optimization, improving performance across diverse tasks.

Large language models(LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalisedeployment-time learning(DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-basedContinual Adaptationduring DEployment), a general and principled framework that equips LLM agents with an explicit, evolvingepisodic memory. CASCADE formulatesexperience reuseas acontextual banditproblem, enabling principled exploration-exploitation trade-offs and establishingno-regret guaranteesover long-term interactions. This design allows agents to accumulate, select, and refinetask-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improvesmacro-averaged success rateby 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.

View arXiv pageView PDFGitHub1Add to collection

Get this paper in your agent:

hf papers read 2605\.06702

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.06702 in a model README.md to link it from this page.

Datasets citing this paper1

#### guosy/DTLBench Viewer• Updatedabout 1 hour ago • 32.7k • 103

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.06702 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Attribution-Guided Continual Learning for Large Language Models

arXiv cs.LG

This paper proposes an attribution-guided continual fine-tuning framework for large language models that estimates task-specific parameter importance in Transformer layers and modulates gradients accordingly, mitigating catastrophic forgetting while maintaining performance on new tasks.

ReAD: Reinforcement-Guided Capability Distillation for Large Language Models

arXiv cs.CL

This paper introduces ReAD, a reinforcement-guided capability distillation framework that optimizes token budgets by accounting for cross-capability transfer in large language models. It demonstrates improved downstream utility and reduced harmful spillover compared to existing baselines.

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

arXiv cs.CL

JumpLoRA introduces a novel sparse adapter framework for continual learning in LLMs using JumpReLU gating to dynamically isolate task parameters and prevent catastrophic forgetting. The method enhances LoRA-based approaches and outperforms state-of-the-art continual learning methods like ELLA.