Tag
This paper introduces AgentViSS, a benchmark evaluating visual social intelligence in multimodal social simulation, containing 240 scenarios with aligned visual-textual evidence. Evaluating seven recent MLLMs reveals a gap between local role enactment and visually grounded interaction management.