variance-reduction

#variance-reduction

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

arXiv cs.LG ↗ · yesterday Cached

This paper introduces Path-Coupled Bellman Flows (PCBF), a continuous-time distributional reinforcement learning method that uses flow matching to model return distributions without heuristic projections. It addresses boundary mismatch and high-variance issues in previous flow-based approaches by coupling current and successor return flows through shared base noise.

0 favorites 0 likes

#variance-reduction

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Hugging Face Daily Papers ↗ · 5d ago Cached

This paper introduces POISE, a method for stable policy optimization in large reasoning models by estimating baselines using the model's own internal states, reducing computational overhead compared to PPO and GRPO.

0 favorites 0 likes

#variance-reduction