Tag
WEAVER is a multi-view world model for robotic manipulation that achieves high fidelity, consistency, and efficiency using flow-matching loss, demonstrating superior performance in policy evaluation, improvement, and test-time planning with significant real-world improvements.
StressDream enhances video world models by steering diffusion-based imaginations toward high-impact yet plausible outcomes through optimized noise initialization with semantic and plausibility objectives, enabling robust policy evaluation and improvement.
This paper demonstrates the robustness of refugee matching impact evaluations using off-policy methods like IPW and AIPW, confirming previous findings on algorithmic refugee assignment.
RoboLab is a high-fidelity simulation benchmarking framework for evaluating task-generalist robotic policies, introducing the RoboLab-120 benchmark with 120 tasks across visual, procedural, and relational competency axes. It enables scalable, realistic task generation and systematic analysis of policy behavior under controlled perturbations to assess true generalization capabilities.