policy-gradients

#policy-gradients

Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

arXiv cs.LG ↗ · 6d ago Cached

This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.

0 favorites 0 likes

#policy-gradients

Learning Montezuma’s Revenge from a single demonstration

OpenAI Blog ↗ · 2018-07-04 Cached

OpenAI demonstrates a method for training a reinforcement learning agent to play Montezuma's Revenge from a single human demonstration, addressing the challenge of sparse rewards through curriculum learning and careful hyperparameter tuning. The approach achieves strong performance on the notoriously difficult Atari game while showing generalization limitations on other titles.

0 favorites 0 likes

#policy-gradients

Evolved Policy Gradients

OpenAI Blog ↗ · 2018-04-18 Cached

OpenAI introduces Evolved Policy Gradients (EPG), a meta-learning approach that learns loss functions through evolution rather than learning policies directly, enabling RL agents to generalize better across tasks by leveraging prior experience similar to how humans transfer skills.

0 favorites 0 likes

#policy-gradients

Equivalence between policy gradients and soft Q-learning

OpenAI Blog ↗ · 2017-04-21 Cached

OpenAI researchers demonstrate a precise mathematical equivalence between soft (entropy-regularized) Q-learning and policy gradient methods in reinforcement learning, providing theoretical insight into why Q-learning works despite inaccurate value estimates. They validate this equivalence empirically on the Atari benchmark and show a Q-learning method can closely match A3C's learning dynamics.

0 favorites 0 likes

policy-gradients

Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

Learning Montezuma’s Revenge from a single demonstration

Evolved Policy Gradients

Equivalence between policy gradients and soft Q-learning

Submit Feedback