Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers
Summary
Researchers introduce CSR-L and CS-MTEB benchmarks showing that code-switching queries degrade IR system performance by up to 27%, revealing embedding-space divergence that current multilingual techniques cannot fix.
View Cached Full Text
Cached at: 04/22/26, 06:17 AM
Paper page - Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers
Source: https://huggingface.co/papers/2604.17632
Abstract
Code-switching poses significant challenges for information retrieval systems, revealing performance bottlenecks and embedding space divergences that current multilingual approaches cannot fully address.
Code-switchingis a pervasive linguistic phenomenon in global communication, yet moderninformation retrievalsystems remain predominantly designed for, and evaluated within, monolingual contexts. To bridge this critical disconnect, we present a holistic study dedicated tocode-switchingIR. We introduceCSR-L(Code-SwitchingRetrieval benchmark-Lite), constructing a dataset via human annotation to capture the authentic naturalness of mixed-language queries. Our evaluation across statistical, dense, and late-interaction paradigms reveals thatcode-switchingacts as a fundamental performance bottleneck, degrading the effectiveness of even robustmultilingual models. We demonstrate that this failure stems from substantial divergence in theembedding spacebetween pure and code-switched text. Scaling this investigation, we proposeCS-MTEB, a comprehensive benchmark covering 11 diverse tasks, where we observe performance declines of up to 27%. Finally, we show that standard multilingual techniques likevocabulary expansionare insufficient to resolve these deficits completely. These findings underscore the fragility of current systems and establishcode-switchingas a crucial frontier for future IR optimization.
View arXiv pageView PDFGitHub0Add to collection
Get this paper in your agent:
hf papers read 2604\.17632
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.17632 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.17632 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.17632 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German
This paper presents a benchmark evaluating five commercial ASR systems on code-switching speech across Arabic-English, Persian-English, and German-English pairs, using a two-stage pipeline to select 300 samples per pair and assessing performance with WER and BERTScore. ElevenLabs Scribe v2 achieves the lowest overall WER (13.2%) and highest BERTScore (0.936), with public dataset available.
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
This paper introduces CoREB, a contamination-limited multitask benchmark for code search that evaluates text-to-code, code-to-text, and code-to-code retrieval with fine-tuned reranking capabilities.
Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch
This paper introduces a data-efficient fine-tuning framework for teaching reasoning models to code-switch (mix languages) effectively, demonstrating that strategic code-switching can improve reasoning capabilities for lower-resource languages. The work analyzes code-switching behaviors in large language models across diverse languages, tasks, and domains, then develops interventions to promote beneficial code-switching patterns.
MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks
Introduces MTR-Suite, a unified framework for evaluating and synthesizing conversational retrieval benchmarks, featuring an LLM-based auditor, a multi-agent pipeline for cost-effective dialogue generation, and a benchmark with high discriminative power.
Real-time multilingual ASR using rolling buffers and monolingual models [P]
A routing-based approach for real-time multilingual ASR that uses smaller monolingual models with a rollback mechanism to handle language switches, achieving ~13% WER on inter-utterance code-switching and open-sourcing the system.