Tag
This paper presents the first implementation of an infra-Bayesian reinforcement learning agent, demonstrating that it outperforms classical RL in worst-case regret and handles Newcomb's problem optimally, offering a step toward robustness under model misspecification.