Tag
Introduces ConsumerSimBench, a benchmark for evaluating LLMs' ability to reconstruct crowd-level consumer reactions from real Chinese social media topics. Tests show frontier models cover only 47.8% of real reaction criteria, highlighting a gap between technical benchmark performance and social intuition.