multilingual-benchmark

#multilingual-benchmark

ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety

arXiv cs.CL ↗ · 2026-05-15 Cached

Introduces ROK-FORTRESS, a bilingual benchmark for measuring how language and geopolitical context jointly affect LLM safety behavior, using English-Korean and US-ROK axes as a case study. Findings show language and context interact in ways that translation-only evaluations miss.

0 favorites 0 likes

#multilingual-benchmark

CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks

arXiv cs.CL ↗ · 2026-04-22 Cached

CulturALL introduces a 2,610-sample benchmark across 14 languages and 51 regions to evaluate LLMs on real-world, culturally grounded tasks; top model scores only 44.48%, highlighting large room for improvement.

0 favorites 0 likes

#multilingual-benchmark

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

arXiv cs.CL ↗ · 2026-04-22 Cached

Researchers introduce MORPHOGEN, a multilingual benchmark testing LLMs’ ability to rewrite first-person sentences in the opposite gender while preserving meaning across French, Arabic, and Hindi.

0 favorites 0 likes

multilingual-benchmark

ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety

CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

Submit Feedback