sample-efficiency

#sample-efficiency

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Hugging Face Daily Papers ↗ · 3d ago Cached

StraTA proposes strategic trajectory abstraction for long-horizon LLM agents, using hierarchical GRPO-style rollout with diverse strategy sampling and critical self-judgment to improve sample efficiency and final performance over frontier models and prior RL baselines.

0 favorites 0 likes

#sample-efficiency

Freshness-Aware Prioritized Experience Replay for LLM/VLM Reinforcement Learning

arXiv cs.CL ↗ · 2026-04-21 Cached

FreshPER introduces a freshness-aware prioritized experience replay method for LLM/VLM reinforcement learning that addresses the 'priority staleness' problem by applying exponential age decay to stored priorities, enabling off-policy reuse of trajectories. Evaluated on eight agentic, reasoning, and math tasks, FreshPER significantly outperforms on-policy baselines with gains up to +367% on Sokoban.

0 favorites 0 likes

#sample-efficiency

Procgen and MineRL Competitions

OpenAI Blog ↗ · 2020-06-20 Cached

OpenAI co-organizes the MineRL 2020 Competition to advance sample-efficient reinforcement learning algorithms that leverage human demonstrations. Participants compete to obtain a diamond in Minecraft using only 8 million simulator samples and 4 days of single-GPU training, with access to a 60+ million frame human demonstration dataset.

0 favorites 0 likes

#sample-efficiency

Learning from human preferences

OpenAI Blog ↗ · 2017-06-13 Cached

OpenAI presents a method for training AI agents using human preference feedback, where an agent learns reward functions from human comparisons of behavior trajectories and uses reinforcement learning to optimize for the inferred goals. The approach demonstrates strong sample efficiency, requiring less than 1000 bits of human feedback to train an agent to perform a backflip.

0 favorites 0 likes

sample-efficiency

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Freshness-Aware Prioritized Experience Replay for LLM/VLM Reinforcement Learning

Procgen and MineRL Competitions

Learning from human preferences

Submit Feedback