Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

arXiv cs.CL 06/11/26, 04:00 AM Papers

bias automatic-speech-recognition phoneme ipa whisperipa zipa evaluation

Summary

This paper evaluates demographic and accent biases in phoneme-based ASR systems, specifically WhisperIPA and ZIPA, using phoneme error rate and a new Soft PER metric, revealing persistent disparities across languages and groups.

arXiv:2606.11639v1 Announce Type: new Abstract: The popularization of automatic speech recognition (ASR) systems has increased exploration of the demographic biases related to race, age, gender, and accent, often formed from imbalanced training data. Most of these studies focused on standard grapheme-based ASR systems with comparatively little emphasis on phoneme-based systems, such as models that produce International Phonetic Alphabet (IPA) representations. As ASR systems shift toward multilingual support and low-resource language modeling, IPA-based layers serve as a critical, language-agnostic foundation. In this study, we evaluate the performance of two state-of-the-art open-source ASR systems, WhisperIPA and ZIPA, that generate IPA transcriptions across diverse accents and language sources. Our evaluation includes existing multilingual speech corpora and demographically annotated English-language corpora. We measure model performance by comparing model-generated IPA transcriptions against grapheme-to-phoneme (G2P) systems using both standard phoneme error rate (PER) and a proposed Soft PER metric that tolerates linguistically similar phoneme substitutions. Our analysis examines how performance varies across languages and demographic groups such as gender, accent, ethnicity, and age, revealing persistent disparities even after accounting for acceptable phonemic variation. These findings provide insight into potential sources of bias and inform the development of more inclusive and linguistically robust phoneme-based ASR systems. Our code and data will be made publicly available to the community.

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:40 PM

# Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models
Source: [https://arxiv.org/abs/2606.11639](https://arxiv.org/abs/2606.11639)
[View PDF](https://arxiv.org/pdf/2606.11639)

> Abstract:The popularization of automatic speech recognition \(ASR\) systems has increased exploration of the demographic biases related to race, age, gender, and accent, often formed from imbalanced training data\. Most of these studies focused on standard grapheme\-based ASR systems with comparatively little emphasis on phoneme\-based systems, such as models that produce International Phonetic Alphabet \(IPA\) representations\. As ASR systems shift toward multilingual support and low\-resource language modeling, IPA\-based layers serve as a critical, language\-agnostic foundation\. In this study, we evaluate the performance of two state\-of\-the\-art open\-source ASR systems, WhisperIPA and ZIPA, that generate IPA transcriptions across diverse accents and language sources\. Our evaluation includes existing multilingual speech corpora and demographically annotated English\-language corpora\. We measure model performance by comparing model\-generated IPA transcriptions against grapheme\-to\-phoneme \(G2P\) systems using both standard phoneme error rate \(PER\) and a proposed Soft PER metric that tolerates linguistically similar phoneme substitutions\. Our analysis examines how performance varies across languages and demographic groups such as gender, accent, ethnicity, and age, revealing persistent disparities even after accounting for acceptable phonemic variation\. These findings provide insight into potential sources of bias and inform the development of more inclusive and linguistically robust phoneme\-based ASR systems\. Our code and data will be made publicly available to the community\.

## Submission history

From: Catherine Bao Bao \[[view email](https://arxiv.org/show-email/800ed856/2606.11639)\] **\[v1\]**Wed, 10 Jun 2026 04:00:44 UTC \(209 KB\)

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

Similar Articles

Your Multimodal Speech Model Says I Have a Face for Radio

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Phonetic Modeling of Dialectal Variation in Vietnamese Speech

Submit Feedback

Similar Articles

Your Multimodal Speech Model Says I Have a Face for Radio

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

Phonetic Modeling of Dialectal Variation in Vietnamese Speech