Tag
This paper proposes a performance-driven state abstraction method for reinforcement learning that directly optimizes decision quality, using a multi-timescale framework to jointly adapt the policy and a tree-structured abstraction. The algorithm refines or aggregates state space based on Q-value discrepancies, achieving better sample efficiency and faster replanning than baselines.
This paper proposes ARS, a memory-augmented agentic recommender system that treats recommendation as a partially observable problem with a hierarchical belief-state memory structure. It achieves state-of-the-art performance on four benchmarks with significant improvements over baselines.