Tag
The paper introduces MemQ, a method that integrates Q-learning into self-evolving memory agents by using eligibility traces over provenance DAGs to solve credit assignment problems in episodic memory retrieval.
This research demonstrates that continuously updating LLM agent memories through distillation and consolidation loops causes performance regression, even when trained on ground-truth solutions. The study finds that episodic-only retention outperforms text-based consolidation, highlighting significant flaws in current self-improvement paradigms.
This paper introduces CASCADE, a framework for deployment-time learning that allows Large Language Models to adapt continuously through episodic memory and contextual bandit optimization without modifying model parameters.