RobotValues: Evaluating Household Robots When Human Values Conflict
Summary
Introduces RobotValues, a benchmark of 10K value-conflict scenarios for evaluating household robot planners, showing that vision-language models exhibit default value preferences and fail to override them 80% of the time when instructed to prioritize conflicting values.
View Cached Full Text
Cached at: 06/05/26, 06:06 AM
Paper page - RobotValues: Evaluating Household Robots When Human Values Conflict
Source: https://huggingface.co/papers/2606.03312
Abstract
RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values.
Whilehousehold robotsare often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots’ value preferences in such scenarios. We introduceRobotValues, a benchmark to evaluate household robot planners in 10Kvalue-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We constructRobotValuesthroughLLM-assisted scenario generation,stakeholder-grounded value extraction,image generationandautomatic quality control. UsingRobotValueswe evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.03312
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.03312 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.03312 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.03312 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies
RoboLab is a high-fidelity simulation benchmarking framework for evaluating task-generalist robotic policies, introducing the RoboLab-120 benchmark with 120 tasks across visual, procedural, and relational competency axes. It enables scalable, realistic task generation and systematic analysis of policy behavior under controlled perturbations to assess true generalization capabilities.
Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
This paper introduces Agent-ValueBench, a comprehensive benchmark designed to evaluate the values of autonomous agents, revealing that agent values diverge from their underlying language models.
What Do People Actually Want From AI? Mapping Preference Plurality
This paper analyzes 1,500 open-ended responses from 75 countries to reveal that people have diverse and often conflicting preferences for AI, with truthfulness being the only widely demanded value (49%), yet defined in incompatible ways. It argues that current RLHF methods flatten these pluralistic preferences into universal reward models, perpetuating epistemic violence.
@rohanpaul_ai: Dr Fei-Fei-Li (@drfeifei ) explains why and how everyday household chores are so extremely difficult for Robots. "If yo…
Dr. Fei-Fei Li discusses the challenges robots face in understanding and executing everyday household tasks, highlighting the difficulty of grounding natural language instructions like 'open the drawer while avoiding the vase' into robot actions.
Robots Need More than VLA and World Models
This position paper argues that advancing robot intelligence requires integrating unstructured behavioral data through specialized interfaces for labeling, embodiment mapping, world modeling, and reward inference, rather than relying solely on scaling Vision-Language-Action (VLA) models and world models.