Tag
This study evaluates bilingual fine-tuning with language identification tokens for improving ASR in low-resource languages across nine diverse language pairs, finding that high LID accuracy is beneficial and that providing the LID token at inference can boost performance when LID accuracy is low.
GitHub announces the GitHub Multilingual Repositories Dataset, an open metadata dataset covering over 80 million classification rows across 40 million repositories to help researchers and developers build multilingual AI tools.