off-policy-evaluation

#off-policy-evaluation

Off-Policy Evaluation with Strategic Agents via Local Disclosure

arXiv cs.AI ↗ · 2026-06-08 Cached

This paper studies off-policy evaluation (OPE) when decision subjects (agents) strategically modify their covariates in response to a policy. It proposes a method that uses local disclosure via post-hoc explanations to reveal agents' pre-strategic covariates and construct a doubly robust estimator for policy value.

0 favorites 0 likes

#off-policy-evaluation

Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents

arXiv cs.LG ↗ · 2026-06-05 Cached

Proposes Adwm, an autoregressive diffusion world model for off-policy evaluation of LLM agents, enabling reliable value estimates from pre-collected trajectories without online interaction.

0 favorites 0 likes

#off-policy-evaluation

Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning

arXiv cs.LG ↗ · 2026-06-03 Cached

The paper introduces Human-in-the-Loop Gated Bandit (HITL-GB) for short-term rental dynamic pricing, showing that historical pricing data under a prior policy is structurally equivalent to on-policy warm-up data, reducing cold-start from ~150 to ~30 episodes.

0 favorites 0 likes

#off-policy-evaluation

Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper demonstrates the robustness of refugee matching impact evaluations using off-policy methods like IPW and AIPW, confirming previous findings on algorithmic refugee assignment.

0 favorites 0 likes

off-policy-evaluation