Tag
Introduces RobotValues, a benchmark of 10K value-conflict scenarios for evaluating household robot planners, showing that vision-language models exhibit default value preferences and fail to override them 80% of the time when instructed to prioritize conflicting values.