Gemma 4 audio with MLX

Simon Willison's Blog 04/12/26, 11:57 PM Tools

gemma-4 mlx audio-transcription macos open-source machine-learning

Summary

A practical guide for audio transcription on macOS using Gemma 4 E2B model with MLX and mlx-vlm, including a uv run recipe and demonstration of the workflow.

No content available

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 08:27 AM

# Gemma 4 audio with MLX Source: https://simonwillison.net/2026/Apr/12/mlx-audio/ 12th April 2026 Thanks to a tip from Rahim Nathwani (https://twitter.com/RahimNathwani/status/2039961945613209852), here's a `uv run` recipe for transcribing an audio file on macOS using the 10.28 GB Gemma 4 E2B model (https://huggingface.co/google/gemma-4-E2B) with MLX and mlx-vlm (https://github.com/Blaizzy/mlx-vlm): `` uv run --python 3.13 --with mlx_vlm --with torchvision --with gradio \ mlx_vlm.generate \ --model google/gemma-4-e2b-it \ --audio file.wav \ --prompt "Transcribe this audio" \ --max-tokens 500 \ --temperature 1.0 `` Your browser does not support the audio element. I tried it on this 14 second `.wav` file (https://static.simonwillison.net/static/2026/demo-audio-for-gemma.wav) and it output the following: > This front here is a quick voice memo. I want to try it out with MLX VLM. Just going to see if it can be transcribed by Gemma and how that works. (That was supposed to be "This right here..." and "... how well that works" but I can hear why it misinterpreted that as "front" and "how that works".)

Similar Articles

New Gemma 4 MTP on MLX?

Reddit r/LocalLLaMA

Google released Multi Token Prediction drafters for Gemma 4 to accelerate inference via speculative decoding, but support for MLX is currently unconfirmed or unavailable.

Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2

Hugging Face Models Trending

SuperGemma4-26B-Uncensored-MLX-4bit-v2 is a fine-tuned and quantized variant of Google's Gemma 4 26B optimized for Apple Silicon, offering improved performance on code, reasoning, and tool-use tasks while maintaining faster inference speeds compared to the stock baseline.

@ivanfioravanti: Autoresearch from @karpathy in action locally using gemma-4-26b-a4b-it-6bit with oMLX on an M5 Max to train Gemma 4 E2B…

X AI KOLs Timeline

Developer Ivan Fioravanti demonstrates running Andrej Karpathy's autoresearch project locally with a 6-bit quantized Gemma-4-26B model on Apple Silicon, suggesting successful training of Gemma 4 E2B IT variant.

Trials and tribulations fine-tuning & deploying Gemma-4 [P]

Reddit r/MachineLearning

An ML team documents practical challenges encountered while fine-tuning and deploying Gemma-4, including incompatibilities with PEFT, SFTTrainer, DeepSpeed ZeRO-3, and lack of runtime LoRA serving support, along with workarounds for each issue.

google/gemma-4-31B-it-assistant