How I implemented ASR bias for voice transcription models [Open Source]

Reddit r/LocalLLaMA 06/11/26, 10:56 AM Tools

asr-bias voice-transcription open-source whisper speech-recognition dictation

Summary

The article explains how to implement ASR biasing for voice transcription models, using examples from Groq and local models, and introduces the open-source Freestyle project that incorporates this feature.

I've been spending the last couple of weeks building a Wispr Flow clone as an open source project. For context, it is a voice dictation app that lets you type faster, by speaking instead of actually typing. I spent the first week building the basic STT capabilities. One of the coolest features that Wispr Flow has is ASR biasing. Wispr Flow calls it its dictionary. I was able to figure out how to implement that for my project and wanted to share how it was done. **What is ASR biasing?** ASR biasing is a transcription technique that guides the model with hints on how words are spelled, or what phrases are common. In my example in the video, I gave guidance that I wanted to talk about the “Knicks” and “OG Anunoby”. When you have biasing set up, the words that you have set up are more likely to show up when you say phrases that sound similar. **How it's implemented in code** Implementing ASR biasing is actually incredibly easy. Each model provider handles it differently, and they call it different things. For example, OpenAI and Groq set a prompt as its bias mechanism, similar to an LLM system prompt. Local models like whisper.cpp and local Mac models from MLX also run the same prompt system. In other providers like Deepgram and Eleven Labs, they call them key terms and are configured by search parameters. This is what it looks like to implement in Groq. It's as simple as injecting the dictionary words into the model's “system prompt”. ``` const transcription = await groq.audio.transcriptions.create({ file: fs.createReadStream("YOUR_AUDIO.wav"), model: "whisper-large-v3-turbo", prompt: "vocabulary: Knicks, OG Anonuby", // Optional response_format: "verbose_json", timestamp_granularities: ["word", "segment"], language: "en", temperature: 0.0, }); ``` In Freestyle, we've implemented ASR biasing and call it our “Vocabulary” feature. When you create a vocabulary, it is saved locally within Freestyle. Every time you run inference, your saved vocabulary is freshly injected into models’ system prompt or keyterms. **Freestyle oss project** All of the work that we've done around ASR biasing is open source and available in our GitHub repo. If this project sounds interesting to you, consider giving it a star! We're also looking to build a community of people interested in working on open source voice dictation. https://github.com/freestyle-voice/freestyle

Original Article

How I implemented ASR bias for voice transcription models [Open Source]

Similar Articles

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

Real-time multilingual ASR using rolling buffers and monolingual models [P]

Introducing next-generation audio models in the API

Submit Feedback

Similar Articles

@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

Real-time multilingual ASR using rolling buffers and monolingual models [P]

Introducing next-generation audio models in the API