counterfactual-learning

#counterfactual-learning

An Introduction to Causal Reinforcement Learning

arXiv cs.AI ↗ · 18h ago Cached

This paper introduces causal reinforcement learning (CRL), unifying causal inference and reinforcement learning under a structural causal model framework, and explores novel learning settings such as generalized policy learning and counterfactual learning.

0 favorites 0 likes

#counterfactual-learning

@Ankur_Samanta_: New work on credit assignment in multi-step reasoning RL post-training Introducing Self-Reset Policy Optimization (SRPO…

X AI KOLs Timeline ↗ · 2d ago Cached

Self-Reset Policy Optimization (SRPO) addresses credit assignment in multi-step reasoning RL post-training by localizing the first wrong reasoning step and learning from counterfactual continuations without external supervision.

0 favorites 0 likes

counterfactual-learning

An Introduction to Causal Reinforcement Learning

@Ankur_Samanta_: New work on credit assignment in multi-step reasoning RL post-training Introducing Self-Reset Policy Optimization (SRPO…

Submit Feedback