Tag
This paper identifies the retention-forgetting dilemma in verbal reinforcement learning for LLM agents operating in non-stationary environments, and proposes a three-layer architecture with a feedback-driven curation loop to govern insight extraction and application.