How I implemented ASR bias for voice transcription models [Open Source]

Reddit r/LocalLLaMA Tools

Summary

The article explains how to implement ASR biasing for voice transcription models, using examples from Groq and local models, and introduces the open-source Freestyle project that incorporates this feature.

I've been spending the last couple of weeks building a Wispr Flow clone as an open source project. For context, it is a voice dictation app that lets you type faster, by speaking instead of actually typing. I spent the first week building the basic STT capabilities. One of the coolest features that Wispr Flow has is ASR biasing. Wispr Flow calls it its dictionary. I was able to figure out how to implement that for my project and wanted to share how it was done. **What is ASR biasing?** ASR biasing is a transcription technique that guides the model with hints on how words are spelled, or what phrases are common. In my example in the video, I gave guidance that I wanted to talk about the “Knicks” and “OG Anunoby”. When you have biasing set up, the words that you have set up are more likely to show up when you say phrases that sound similar. **How it's implemented in code** Implementing ASR biasing is actually incredibly easy. Each model provider handles it differently, and they call it different things. For example, OpenAI and Groq set a prompt as its bias mechanism, similar to an LLM system prompt. Local models like whisper.cpp and local Mac models from MLX also run the same prompt system. In other providers like Deepgram and Eleven Labs, they call them key terms and are configured by search parameters. This is what it looks like to implement in Groq. It's as simple as injecting the dictionary words into the model's “system prompt”. ``` const transcription = await groq.audio.transcriptions.create({ file: fs.createReadStream("YOUR_AUDIO.wav"), model: "whisper-large-v3-turbo", prompt: "vocabulary: Knicks, OG Anonuby", // Optional response_format: "verbose_json", timestamp_granularities: ["word", "segment"], language: "en", temperature: 0.0, }); ``` In Freestyle, we've implemented ASR biasing and call it our “Vocabulary” feature. When you create a vocabulary, it is saved locally within Freestyle. Every time you run inference, your saved vocabulary is freshly injected into models’ system prompt or keyterms. **Freestyle oss project** All of the work that we've done around ASR biasing is open source and available in our GitHub repo. If this project sounds interesting to you, consider giving it a star! We're also looking to build a community of people interested in working on open source voice dictation. https://github.com/freestyle-voice/freestyle
Original Article

Similar Articles

@MaxForAI: If you are working on voice agents, you should try this project. A team from NTU, NUS, and Shanghai AI Lab released: Mega-ASR. This fully open-source ASR is built on Qwen3-ASR, aiming to break the long-standing bottleneck of ASR performance in noisy, reverberant, or other impaired real-world environments...

X AI KOLs Timeline

NTU, NUS, and Shanghai AI Lab jointly released Mega-ASR, a fully open-source ASR model built on Qwen3-ASR. Using the Voices-in-the-Wild-2M dataset and progressive acoustic-to-semantic optimization, it achieves up to 30% relative Word Error Rate (WER) reduction in real-world noisy environments. With only 1.7B parameters, it enables efficient inference on consumer-grade hardware.

Introducing next-generation audio models in the API

OpenAI Blog

OpenAI introduced next-generation audio models for the API, including improved speech-to-text (gpt-4o-transcribe, gpt-4o-mini-transcribe) and customizable text-to-speech models that enable developers to build more intelligent and expressive voice agents with enhanced accuracy across challenging scenarios.