code-switching

Tag

Cards List
#code-switching

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

arXiv cs.CL · 2d ago Cached

This paper introduces BOUTEF, a large-scale multilingual corpus for studying fake news in Algeria and Tunisia, covering Arabic dialects, Arabizi, French, English, and code-switching. It includes empirical analysis of linguistic strategies and engagement dynamics.

0 favorites 0 likes
#code-switching

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

arXiv cs.CL · 2026-05-26 Cached

This paper applies Direct Preference Optimization (DPO) to align Audio LLMs for transcribing English-Mandarin code-switching speech, achieving up to 89.6% MER reduction in-distribution and 20% out-of-distribution. It identifies three failure modes—language omission, translation instead of transcription, and hallucination—and shows that preference-based alignment effectively elicits correct code-switching behavior from multilingual Audio LLMs.

0 favorites 0 likes
#code-switching

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

arXiv cs.CL · 2026-05-20 Cached

This paper presents a benchmark evaluating five commercial ASR systems on code-switching speech across Arabic-English, Persian-English, and German-English pairs, using a two-stage pipeline to select 300 samples per pair and assessing performance with WER and BERTScore. ElevenLabs Scribe v2 achieves the lowest overall WER (13.2%) and highest BERTScore (0.936), with public dataset available.

0 favorites 0 likes
#code-switching

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

arXiv cs.CL · 2026-04-20 Cached

MUSCAT is a new multilingual, scientific conversation benchmark dataset for evaluating ASR systems on challenging multilingual scenarios including code-switching, domain-specific vocabulary, and mixed language input. The dataset consists of bilingual discussions on scientific papers between speakers using different languages, with results showing current state-of-the-art systems struggle with these multilingual challenges.

0 favorites 0 likes
#code-switching

Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch

arXiv cs.CL · 2026-04-20 Cached

This paper introduces a data-efficient fine-tuning framework for teaching reasoning models to code-switch (mix languages) effectively, demonstrating that strategic code-switching can improve reasoning capabilities for lower-resource languages. The work analyzes code-switching behaviors in large language models across diverse languages, tasks, and domains, then develops interventions to promote beneficial code-switching patterns.

0 favorites 0 likes
#code-switching

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers

Hugging Face Daily Papers · 2026-04-19 Cached

Researchers introduce CSR-L and CS-MTEB benchmarks showing that code-switching queries degrade IR system performance by up to 27%, revealing embedding-space divergence that current multilingual techniques cannot fix.

0 favorites 0 likes
← Back to home

Submit Feedback