sparse-rewards

#sparse-rewards

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

This paper proposes Hierarchical Advantage-Weighted Behavior Cloning (HABC) for fine-tuning Vision-Language-Action (VLA) policies using online reinforcement learning with sparse binary episode outcomes. HABC separates viability and efficiency objectives via adaptive critic heads and intervention-aware credit assignment, significantly improving success rates on contact-rich bimanual manipulation tasks.

0 favorites 0 likes

#sparse-rewards

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper frames LLM-generated reward shaping for sparse structured RL as a debugging problem, identifying failure modes like reward flooding and semantic misunderstanding. The authors propose diagnostic-driven iterative refinement, achieving dramatic success rate improvements (e.g., DoorKey-8×8 from 2.3% to 97.6%) compared to one-shot generation.

0 favorites 0 likes

#sparse-rewards

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

arXiv cs.LG ↗ · 2026-05-14 Cached

The paper introduces Reflection-Enhanced Self-Distillation (Resd), a framework that transforms failure feedback into corrective supervision for LLMs, enabling efficient learning from rare successes. It outperforms standard self-distillation baselines and achieves faster early improvement than GRPO with fewer samples.

0 favorites 0 likes

#sparse-rewards

@blc_16: If you want to understand why RL struggles with long-horizon agent tasks, this is a good explanation. The core issue is…

X AI KOLs Timeline ↗ · 2026-05-10

The post explains why Reinforcement Learning struggles with long-horizon tasks due to sparse rewards and highlights GEPA, a method that uses trajectory-level textual reflection to preserve richer feedback signals for optimization.

0 favorites 0 likes

#sparse-rewards

Ingredients for robotics research

OpenAI Blog ↗ · 2018-02-26 Cached

OpenAI presents Hindsight Experience Replay (HER), a reinforcement learning technique that enables robots to learn from failed attempts by retroactively treating achieved alternative outcomes as successful goals, allowing learning even with sparse reward signals.

0 favorites 0 likes

#sparse-rewards

Hindsight Experience Replay

OpenAI Blog ↗ · 2017-07-05 Cached

OpenAI presents Hindsight Experience Replay (HER), a technique enabling sample-efficient reinforcement learning from sparse binary rewards without complex reward engineering. It is demonstrated on robotic arm manipulation tasks including pushing, sliding, and pick-and-place, and validated on physical robots.

0 favorites 0 likes

#sparse-rewards

Stochastic Neural Networks for hierarchical reinforcement learning

OpenAI Blog ↗ · 2017-04-10 Cached

OpenAI researchers propose a framework using stochastic neural networks for hierarchical reinforcement learning that pre-trains useful skills guided by a proxy reward, then leverages these skills for faster learning in downstream tasks with sparse rewards or long horizons.

0 favorites 0 likes

sparse-rewards

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

@blc_16: If you want to understand why RL struggles with long-horizon agent tasks, this is a good explanation. The core issue is…

Ingredients for robotics research

Hindsight Experience Replay

Stochastic Neural Networks for hierarchical reinforcement learning

Submit Feedback