speech-recognition

Tag

Cards List
#speech-recognition

Dolphin-CN-Dialect: Where Chinese Dialects Matter

arXiv cs.CL · yesterday Cached

Dolphin-CN-Dialect is a streaming-capable ASR model that improves dialect recognition through temperature-based sampling and redesigned tokenization, achieving competitive performance with a smaller model size.

0 favorites 0 likes
#speech-recognition

@SeongsikKi5837: 2. (Real time fact checking) - The Interaction Models hear you speak and fact-checks you in real time — like having a t…

X AI KOLs Following · yesterday Cached

The article highlights 'Interaction Models' capable of real-time speech fact-checking during conversations, acting like an attentive teammate.

0 favorites 0 likes
#speech-recognition

Beyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation

arXiv cs.CL · 2d ago Cached

This paper critiques the use of single-reference ground truth in ASR evaluation, arguing it causes epistemic injustice for speakers with aphasia. It proposes a new metric, Epistemic Injustice Distance, and advocates for WER-Range to account for diverse transcription conventions.

0 favorites 0 likes
#speech-recognition

@seclink: OpenAI Launches GPT-Realtime-2, Its Most Intelligent Voice Model to Date. The model features GPT-5-level reasoning, a 128,000 token context window, and supports adjusting 'effort level' for more natural conversation. It can pair with GPT-R…

X AI KOLs Following · 5d ago

OpenAI released the GPT-Realtime-2 voice model, featuring GPT-5-level reasoning capabilities and a 128,000 token context window. It supports real-time translation from over 70 input languages to 13 output languages, achieving 96.6% accuracy on the Big Bench Audio Intelligence benchmark. Greg Brockman called it a milestone in voice translation.

0 favorites 0 likes
#speech-recognition

Advancing voice intelligence with new models in the API

OpenAI Blog · 6d ago Cached

OpenAI has announced three new voice models in its API: GPT-Realtime-2 with advanced reasoning, GPT-Realtime-Translate for live multilingual translation, and GPT-Realtime-Whisper for streaming transcription, aiming to enable more natural and action-oriented voice applications.

0 favorites 0 likes
#speech-recognition

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Hugging Face Blog · 2026-05-06 Cached

Hugging Face announces the addition of private, high-quality datasets from Appen and DataoceanAI to the Open ASR Leaderboard to prevent benchmaxxing and test-set contamination, while maintaining public data for the default average WER calculation.

0 favorites 0 likes
#speech-recognition

Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

arXiv cs.CL · 2026-04-22 Cached

Researchers introduce Voice of India, a 536-hour closed benchmark of unscripted telephonic conversations across 15 Indian languages and 139 regional clusters, exposing geographic and demographic ASR performance disparities.

0 favorites 0 likes
#speech-recognition

@aigclink: Alibaba Tongyi Lab just dropped Fun-ASR 1.5—one industrial-grade model handles 30 languages, all 7 major Chinese dialect families + 20+ regional accents, even classical-poetry recitation. Dialect CER down 56.2 % vs last gen; 5 dialects top 90 % accuracy…

X AI KOLs Timeline · 2026-04-20 Cached

Alibaba Tongyi Lab releases Fun-ASR 1.5: a single model covering 30 languages, seven Chinese dialect groups and 20+ local accents; character-error rate in key dialect scenarios falls 56.2 %, with five dialects exceeding 90 % accuracy.

0 favorites 0 likes
#speech-recognition

BlasBench: An Open Benchmark for Irish Speech Recognition

arXiv cs.CL · 2026-04-20 Cached

BlasBench introduces an open evaluation benchmark for Irish speech recognition with Irish-aware text normalization that preserves linguistic features like fadas, lenition, and eclipsis. The paper benchmarks 12 ASR systems across four architecture families, revealing significant generalization gaps and showing that existing multilingual systems struggle with Irish due to inadequate normalization.

0 favorites 0 likes
#speech-recognition

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

arXiv cs.CL · 2026-04-20 Cached

MUSCAT is a new multilingual, scientific conversation benchmark dataset for evaluating ASR systems on challenging multilingual scenarios including code-switching, domain-specific vocabulary, and mixed language input. The dataset consists of bilingual discussions on scientific papers between speakers using different languages, with results showing current state-of-the-art systems struggle with these multilingual challenges.

0 favorites 0 likes
#speech-recognition

A New Framework for Evaluating Voice Agents (EVA)

Hugging Face Blog · 2026-03-24 Cached

ServiceNow introduces EVA, a new end-to-end evaluation framework for conversational voice agents that jointly scores task accuracy and conversational experience.

0 favorites 0 likes
#speech-recognition

Speak is personalizing language learning with AI

OpenAI Blog · 2025-04-22 Cached

Speak, an AI-powered language learning app, is personalizing education through advanced speech recognition and natural AI tutoring capabilities. CEO Connor Zwick discusses how deep learning breakthroughs and OpenAI's real-time API are enabling more sophisticated accent detection and multimodal understanding for fluency training.

0 favorites 0 likes
#speech-recognition

Introducing Whisper

OpenAI Blog · 2022-09-21 Cached

OpenAI introduces Whisper, an end-to-end encoder-decoder Transformer model trained on large-scale diverse audio data for robust multilingual speech recognition, language identification, and speech-to-English translation. Whisper achieves 50% fewer errors than specialized models on diverse datasets and outperforms supervised benchmarks on speech translation despite not being fine-tuned to specific datasets.

0 favorites 0 likes
← Back to home

Submit Feedback