Tadabur: A Large-Scale Quran Audio Dataset

Hugging Face Daily Papers 04/21/26, 12:00 AM Papers

Summary

Tadabur is a 1,400+ hour Quran audio dataset from 600+ reciters designed to advance Quranic speech research and benchmarking.

Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.

Original Article

View Cached Full Text

Cached at: 04/23/26, 11:54 AM

Paper page - Tadabur: A Large-Scale Quran Audio Dataset

Source: https://huggingface.co/papers/2604.18932

Abstract

DespitegrowinginterestinQuranicdataresearch,existingQurandatasetsremainlimitedinbothscaleanddiversity.Toaddressthisgap,wepresentTadabur,alarge-scaleQuranaudiodataset.Tadaburcomprisesmorethan1400+hoursofrecitationaudiofromover600distinctreciters,providingsubstantialvariationinrecitationstyles,vocalcharacteristics,andrecordingconditions.ThisdiversitymakesTadaburacomprehensiveandrepresentativeresourceforQuranicspeechresearchandanalysis.BysignificantlyexpandingboththetotaldurationandvariabilityofavailableQurandata,TadaburaimstosupportfutureresearchandfacilitatethedevelopmentofstandardizedQuranicspeechbenchmarks.

View arXiv page View PDF Project page GitHub112 Add to collection

Get this paper in your agent:

hf papers read 2604\.18932

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.18932 in a model README.md to link it from this page.

Datasets citing this paper1

#### FaisaI/tadabur Viewer• Updatedabout 22 hours ago • 409k • 3.84k • 13

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.18932 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Tadabur: A Large-Scale Quran Audio Dataset

Paper page - Tadabur: A Large-Scale Quran Audio Dataset

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

TTS Benchmark Comparison (all known TTS up until May 2026)

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting)

Submit Feedback

Similar Articles

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

TTS Benchmark Comparison (all known TTS up until May 2026)

Text-to-Speech (TTS) Benchmark Revamped with Objective Standards and Blind Voting (46 models and counting)