Tag
This paper introduces Mouse, a specialized benchmark for evaluating LLMs on Chinese Chouxiang Language tasks across six NLP domains, revealing that current state-of-the-art models have significant limitations with this subcultural internet language despite performing well on contextual understanding tasks.