pisa

#pisa

mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages?

arXiv cs.CL ↗ · 2026-06-08 Cached

Introduces mmPISA-bench, a compact multilingual reasoning benchmark derived from PISA, and evaluates proprietary LLMs across 43 languages, finding that they reason effectively with some performance variations, and that machine-translated questions do not degrade accuracy.

0 favorites 0 likes

pisa

mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages?

Submit Feedback