Tag
This paper investigates the distributional gap between synthetic and real speech in LLM-based ASR systems, identifies where the LLM separates them, and proposes using layer-selection and RIR augmentation to match real-data baselines with less real data.