Tag
Violin is an open-source end-to-end video translation and video Q&A tool, integrating ASR, LLM translation, and TTS. It supports style adjustment and content re-creation, and can answer questions about video content.
This paper presents a calculus-based framework that uses first and second derivative tests to estimate the optimal vocabulary size hyper-parameter for end-to-end ASR systems, improving performance on the Librispeech corpus.
Violin is an open-source video translation tool that integrates speech recognition, large language model translation, and text-to-speech. It supports over 30 languages and offers three usage modes: CLI, web app, and Claude Code.
Introduces Vividh-ASR, a complexity-tiered benchmark for Hindi and Malayalam ASR, identifies studio-bias in fine-tuning, and proposes R-MFT to improve spontaneous speech performance efficiently.
Dolphin-CN-Dialect is a streaming-capable ASR model that improves dialect recognition through temperature-based sampling and redesigned tokenization, achieving competitive performance with a smaller model size.
Hugging Face announces the addition of private, high-quality datasets from Appen and DataoceanAI to the Open ASR Leaderboard to prevent benchmaxxing and test-set contamination, while maintaining public data for the default average WER calculation.
Researchers introduce Voice of India, a 536-hour closed benchmark of unscripted telephonic conversations across 15 Indian languages and 139 regional clusters, exposing geographic and demographic ASR performance disparities.
Alibaba Tongyi Lab releases Fun-ASR 1.5: a single model covering 30 languages, seven Chinese dialect groups and 20+ local accents; character-error rate in key dialect scenarios falls 56.2 %, with five dialects exceeding 90 % accuracy.
BlasBench introduces an open evaluation benchmark for Irish speech recognition with Irish-aware text normalization that preserves linguistic features like fadas, lenition, and eclipsis. The paper benchmarks 12 ASR systems across four architecture families, revealing significant generalization gaps and showing that existing multilingual systems struggle with Irish due to inadequate normalization.