vocabulary-reduction

#vocabulary-reduction

Optimizing Korean-Centric LLMs via Token Pruning

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper presents a systematic benchmark of token pruning—a compression technique that removes tokens and embeddings for irrelevant languages—applied to Korean-centric LLM tasks. The study evaluates popular multilingual models (Qwen3, Gemma-3, Llama-3, Aya) across different vocabulary configurations and finds that token pruning significantly improves generation stability and reduces memory footprint for domain-specific deployments.

0 favorites 0 likes

vocabulary-reduction

Optimizing Korean-Centric LLMs via Token Pruning

Submit Feedback