RobotValues: Evaluating Household Robots When Human Values Conflict

Hugging Face Daily Papers 06/02/26, 12:00 AM Papers

robot-values household-robots value-conflict benchmark vision-language-models robotics ai-safety

Summary

Introduces RobotValues, a benchmark of 10K value-conflict scenarios for evaluating household robot planners, showing that vision-language models exhibit default value preferences and fail to override them 80% of the time when instructed to prioritize conflicting values.

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots' value preferences in such scenarios. We introduce RobotValues, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using RobotValues we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.

Original Article

View Cached Full Text

Cached at: 06/05/26, 06:06 AM

Paper page - RobotValues: Evaluating Household Robots When Human Values Conflict

Source: https://huggingface.co/papers/2606.03312

Abstract

RobotValues benchmark evaluates household robot planners in value-conflict scenarios, revealing that vision-language models exhibit default value preferences and struggle to override them when instructed to prioritize conflicting values.

Whilehousehold robotsare often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots’ value preferences in such scenarios. We introduceRobotValues, a benchmark to evaluate household robot planners in 10Kvalue-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We constructRobotValuesthroughLLM-assisted scenario generation,stakeholder-grounded value extraction,image generationandautomatic quality control. UsingRobotValueswe evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2606\.03312

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.03312 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.03312 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.03312 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

RobotValues: Evaluating Household Robots When Human Values Conflict

Paper page - RobotValues: Evaluating Household Robots When Human Values Conflict

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

What Do People Actually Want From AI? Mapping Preference Plurality

@rohanpaul_ai: Dr Fei-Fei-Li (@drfeifei ) explains why and how everyday household chores are so extremely difficult for Robots. "If yo…

Robots Need More than VLA and World Models

Submit Feedback

Similar Articles

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

What Do People Actually Want From AI? Mapping Preference Plurality

@rohanpaul_ai: Dr Fei-Fei-Li (@drfeifei ) explains why and how everyday household chores are so extremely difficult for Robots. "If yo…

Robots Need More than VLA and World Models