Tag
XLGoBench introduces a synthetic benchmark of algorithmic tasks to detect cross-lingual skill gaps in LLMs, demonstrating persistent gaps across multiple state-of-the-art models.