Tag
This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.
OpenAI demonstrates a method for training a reinforcement learning agent to play Montezuma's Revenge from a single human demonstration, addressing the challenge of sparse rewards through curriculum learning and careful hyperparameter tuning. The approach achieves strong performance on the notoriously difficult Atari game while showing generalization limitations on other titles.
OpenAI introduces Evolved Policy Gradients (EPG), a meta-learning approach that learns loss functions through evolution rather than learning policies directly, enabling RL agents to generalize better across tasks by leveraging prior experience similar to how humans transfer skills.
OpenAI researchers demonstrate a precise mathematical equivalence between soft (entropy-regularized) Q-learning and policy gradient methods in reinforcement learning, providing theoretical insight into why Q-learning works despite inaccurate value estimates. They validate this equivalence empirically on the Atari benchmark and show a Q-learning method can closely match A3C's learning dynamics.