Tag
mlx-audio v0.4.3 releases with 6 new TTS models including Higgs Audio v2 and OmniVoice (646+ languages), plus server improvements like concurrent requests and continuous batching, ~3x faster Voxtral Realtime on 4-bit, and slimmer dependencies for Apple Silicon.
This paper introduces MultiLinguahah, an unsupervised multilingual method for acoustic laughter segmentation using Isolation Forests on BYOL-A encoder representations. The authors demonstrate that their approach outperforms state-of-the-art supervised methods in non-English settings by treating laughter detection as an anomaly detection task.
This paper details the YEZE system for SemEval-2026 Task 9, which detects online polarization in 22 languages using a heterogeneous ensemble of XLM-RoBERTa and mDeBERTa models.
Introduces a bilingual dataset of Tatarstan toponyms and a hybrid RAG system for geospatial QA, achieving high recall and F1 with multilingual models.
XL-SafetyBench is a benchmark of 5,500 test cases across 10 country-language pairs to evaluate LLM safety and cultural sensitivity, distinguishing jailbreak robustness from cultural awareness.
Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.
UL-XCoT introduces a unified logic space to prune low-quality multilingual reasoning paths, cutting >50% token cost while improving accuracy and robustness on low-resource languages.
KoALa-Bench introduces a Korean-focused benchmark suite for evaluating large audio language models on six tasks, including novel measures of speech faithfulness and Korea-specific cultural content.
MIT researchers release the first multilingual negation benchmark covering seven languages and show VLMs like CLIP struggle with non-Latin scripts, while MultiCLIP and SpaceVLM offer uneven improvements across languages.
Researchers probe language model representations to predict human reading times across five languages, finding early layers outperform surprisal for early-pass measures while surprisal remains superior for late-pass measures.
Researchers introduce BIASEDTALES-ML, a large-scale multilingual dataset of ~350,000 LLM-generated children's stories across eight languages, designed to analyze narrative attribute distributions and cross-lingual bias patterns in language model outputs. The work reveals significant cross-lingual variability, highlighting limitations of English-centric bias evaluations.
Researchers from Bangladesh University of Engineering and Technology present CBRS, a multi-platform framework that filters and parses blood donation requests from social media using a dual-layer architecture and a novel 11K bilingual dataset in Bengali and English. Their LoRA fine-tuned Llama-3.2-3B model achieves 99% filtering accuracy and 92% zero-shot parsing accuracy, outperforming GPT-4o-mini and other LLMs with 35× reduced token usage.
Researchers from University of Utah and CMU propose FragMend, an interpretability-based approach for vocabulary expansion in LLMs that addresses token over-fragmentation in non-Latin script languages. Their method outperforms frequency-based vocabulary selection and baseline embedding initialization by ~20 points for several underrepresented languages.
dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.
SarvamAI releases open-source evaluation frameworks and a guide tailored for 22 Indian languages, addressing limitations of standard WER/CER metrics.
Alibaba Tongyi Lab releases Fun-ASR 1.5: a single model covering 30 languages, seven Chinese dialect groups and 20+ local accents; character-error rate in key dialect scenarios falls 56.2 %, with five dialects exceeding 90 % accuracy.
BlasBench introduces an open evaluation benchmark for Irish speech recognition with Irish-aware text normalization that preserves linguistic features like fadas, lenition, and eclipsis. The paper benchmarks 12 ASR systems across four architecture families, revealing significant generalization gaps and showing that existing multilingual systems struggle with Irish due to inadequate normalization.
CLewR introduces a curriculum learning strategy with restarts for improving machine translation performance in LLMs through preference optimization. The method addresses catastrophic forgetting by iterating easy-to-hard curriculum multiple times, showing consistent gains across Gemma2, Qwen2.5, and Llama3.1 models.
This paper introduces the first parallel Arabic cultural QA benchmark spanning Modern Standard Arabic and multiple dialects, converting multiple-choice questions to open-ended formats and evaluating LLMs with chain-of-thought reasoning to address gaps in culturally grounded and dialect-specific knowledge.
This paper introduces STELA, a linguistics-aware watermarking framework for LLMs that leverages syntactic predictability via POS n-grams to balance text quality and detection robustness. The method enables publicly verifiable watermark detection without requiring access to model logits, demonstrating superior performance across typologically diverse languages (English, Chinese, Korean).