Tag
This paper studies off-policy evaluation (OPE) when decision subjects (agents) strategically modify their covariates in response to a policy. It proposes a method that uses local disclosure via post-hoc explanations to reveal agents' pre-strategic covariates and construct a doubly robust estimator for policy value.
Proposes Adwm, an autoregressive diffusion world model for off-policy evaluation of LLM agents, enabling reliable value estimates from pre-collected trajectories without online interaction.
The paper introduces Human-in-the-Loop Gated Bandit (HITL-GB) for short-term rental dynamic pricing, showing that historical pricing data under a prior policy is structurally equivalent to on-policy warm-up data, reducing cold-start from ~150 to ~30 episodes.
This paper demonstrates the robustness of refugee matching impact evaluations using off-policy methods like IPW and AIPW, confirming previous findings on algorithmic refugee assignment.