Tag
Introduces mmPISA-bench, a compact multilingual reasoning benchmark derived from PISA, and evaluates proprietary LLMs across 43 languages, finding that they reason effectively with some performance variations, and that machine-translated questions do not degrade accuracy.