Tag
CLaaS is a system for continual learning of LLM agents in deployment, using experience replay for sample-efficient online adaptation.
This paper investigates whether shallow neural network agents can master the card game Schnapsen using reinforcement learning, outperforming a supervised imitation baseline and achieving competitive results against a strong search-based opponent.
FreshPER introduces a freshness-aware prioritized experience replay method for LLM/VLM reinforcement learning that addresses the 'priority staleness' problem by applying exponential age decay to stored priorities, enabling off-policy reuse of trajectories. Evaluated on eight agentic, reasoning, and math tasks, FreshPER significantly outperforms on-policy baselines with gains up to +367% on Sokoban.
OpenAI presents Hindsight Experience Replay (HER), a technique enabling sample-efficient reinforcement learning from sparse binary rewards without complex reward engineering. It is demonstrated on robotic arm manipulation tasks including pushing, sliding, and pick-and-place, and validated on physical robots.