wav2vec2

#wav2vec2

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

arXiv cs.AI ↗ · 4d ago Cached

This paper presents a systematic empirical study of fine-tuning pretrained Transformer models (Wav2Vec2.0, HuBERT, XLS-R) for Quranic Automatic Speech Recognition (ASR), achieving a WER of 0.08 on the EveryAyah subset and reducing training time from 140 to 40 hours, with Wav2Vec2-XLSR-53 providing the best representation.

0 favorites 0 likes

#wav2vec2

Perceptual compensation for tonal context in self-supervised speech models

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper investigates whether the wav2vec2.0 architecture exhibits perceptual compensation for tonal context in Mandarin Chinese, finding limited evidence in the self-supervised model compared to human listeners and suggesting that supervised fine-tuning may be necessary for such phonological abstraction.

0 favorites 0 likes

#wav2vec2

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

arXiv cs.CL ↗ · 2026-05-29 Cached

This paper evaluates nine ASR models (Whisper, Parakeet, Wav2Vec2) on Dutch child speech datasets JASMIN and DART, finding that fine-tuned Whisper-medium achieves the best performance (WER 5.54% on JASMIN, 70.37% on DART). It also proposes a selection method to automatically identify correctly pronounced utterances with high precision, reducing the need for manual verification.

0 favorites 0 likes

#wav2vec2

easyaligner: Forced alignment with GPU acceleration and flexible text normalization (compatible with all w2v2 models on HF Hub) [P]

Reddit r/MachineLearning ↗ · 2026-04-18

easyaligner is an open-source forced alignment library with GPU acceleration and flexible text normalization that works with all wav2vec2 models on Hugging Face Hub. It addresses practical workflows like handling partial transcripts, irrelevant speech segments, and long audio without chunking while preserving original text formatting.

0 favorites 0 likes

wav2vec2

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

Perceptual compensation for tonal context in self-supervised speech models

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

easyaligner: Forced alignment with GPU acceleration and flexible text normalization (compatible with all w2v2 models on HF Hub) [P]

Submit Feedback