inverse-reinforcement-learning

#inverse-reinforcement-learning

Quantifying Potential Observation Missingness in Inverse Reinforcement Learning

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper identifies the problem of missing observations in inverse reinforcement learning (IRL) that can make expert actions appear suboptimal, and develops a practical algorithm to quantify the minimal perturbations needed for expert actions to appear optimal, validated on synthetic tasks, cancer treatment simulation, and ICU data.

0 favorites 0 likes

#inverse-reinforcement-learning

Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates

arXiv cs.LG ↗ · 2026-05-13 Cached

This paper introduces Trust Region Inverse Reinforcement Learning (TRIRL), a method that combines monotonic dual improvement with efficient local policy updates to outperform state-of-the-art imitation learning methods. It addresses the trade-off between stability and computational cost in IRL by using trust-region constraints.

0 favorites 0 likes

#inverse-reinforcement-learning

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

arXiv cs.LG ↗ · 2026-05-12 Cached

This paper introduces Interactive Inverse Reinforcement Learning (IIRL), a framework where a learner actively interacts with an expert to infer reward functions, formulated as a stochastic bi-level optimization problem. The authors propose the BISIRL algorithm, providing convergence guarantees and experimental validation for this interactive learning paradigm.

0 favorites 0 likes

#inverse-reinforcement-learning

Multi-Objective Constraint Inference using Inverse reinforcement learning

arXiv cs.AI ↗ · 2026-05-11 Cached

This paper introduces MOCI, a novel framework for inferring shared constraints and individual preferences from heterogeneous expert demonstrations in reinforcement learning, outperforming existing baselines in predictive performance and computational efficiency.

0 favorites 0 likes

#inverse-reinforcement-learning

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers propose SPS (Steering Probability Squeezing), a training paradigm combining reinforcement learning with inverse reinforcement learning to address probability squeezing in LLM reasoning training, where probability mass concentrates too narrowly on high-reward trajectories, limiting exploration and multi-sample performance (Pass@k). Experiments on five reasoning benchmarks demonstrate improved exploration and Pass@k metrics.

0 favorites 0 likes

#inverse-reinforcement-learning

A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models

OpenAI Blog ↗ · 2016-11-11 Cached

This paper establishes mathematical equivalences between generative adversarial networks (GANs), inverse reinforcement learning (IRL), and energy-based models (EBMs), demonstrating that certain IRL methods are equivalent to GANs with evaluable generator density. The work bridges three research communities to enable knowledge transfer for developing more stable and scalable algorithms.

0 favorites 0 likes

inverse-reinforcement-learning

Quantifying Potential Observation Missingness in Inverse Reinforcement Learning

Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

Multi-Objective Constraint Inference using Inverse reinforcement learning

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models

Submit Feedback