audio-llms

#audio-llms

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper applies Direct Preference Optimization (DPO) to align Audio LLMs for transcribing English-Mandarin code-switching speech, achieving up to 89.6% MER reduction in-distribution and 20% out-of-distribution. It identifies three failure modes—language omission, translation instead of transcription, and hallucination—and shows that preference-based alignment effectively elicits correct code-switching behavior from multilingual Audio LLMs.

0 favorites 0 likes

#audio-llms

EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs

arXiv cs.CL ↗ · 2026-05-26 Cached

EchoDistill is an alignment-based noisy-to-clean self-distillation framework that improves the robustness of Audio Large Language Models (ALLMs) against real-world noise by using a frozen clean-audio teacher to guide the student model via group-relative policy optimization (GRPO). Experiments show significant improvements in semantic reliability and task performance under strong noise without additional inference costs.

0 favorites 0 likes

audio-llms

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs

Submit Feedback