Tag
Introduces SynCred-Bench, a benchmark of 600 AI-generated misinformation images across six credible-form categories, showing that existing detectors (including MLLMs, open-source AIGC detectors, and commercial APIs) perform poorly, with human annotators also struggling.
Introduces ESI-BENCH, a comprehensive benchmark for embodied spatial intelligence built on OmniGibson, covering 10 task categories and 29 subcategories. Experiments show active exploration substantially outperforms passive approaches, with failures mainly due to action blindness rather than perception, revealing a metacognitive gap in models compared to humans.