multilingual-llms

#multilingual-llms

Discovering Lexical Gaps Using Embeddings from Multilingual LLMs

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper proposes a data-driven framework using embeddings from multilingual LLMs to detect lexical gaps between languages, achieving high accuracy in Korean-English pairs.

0 favorites 0 likes

#multilingual-llms

Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs

arXiv cs.CL ↗ · 2026-04-22 Cached

Google Research introduces LocQA, a 12-language dataset revealing that multilingual LLMs exhibit strong US-centric and population-based locale biases when answering ambiguous locale-dependent questions.

0 favorites 0 likes

#multilingual-llms

Large Language Models for Math Education in Low-Resource Languages: A Study in Sinhala and Tamil

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper evaluates the mathematical reasoning capabilities of large language models in Sinhala and Tamil, two low-resource South Asian languages, using a parallel dataset of independently authored problems. The study demonstrates that while basic arithmetic transfers well across languages, complex reasoning tasks show significant performance degradation in non-English languages, with implications for deploying AI tutoring tools in multilingual educational contexts.

0 favorites 0 likes

#multilingual-llms

Optimizing Korean-Centric LLMs via Token Pruning

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper presents a systematic benchmark of token pruning—a compression technique that removes tokens and embeddings for irrelevant languages—applied to Korean-centric LLM tasks. The study evaluates popular multilingual models (Qwen3, Gemma-3, Llama-3, Aya) across different vocabulary configurations and finds that token pruning significantly improves generation stability and reduces memory footprint for domain-specific deployments.

0 favorites 0 likes

multilingual-llms

Discovering Lexical Gaps Using Embeddings from Multilingual LLMs

Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs

Large Language Models for Math Education in Low-Resource Languages: A Study in Sinhala and Tamil

Optimizing Korean-Centric LLMs via Token Pruning

Submit Feedback