@Freerunnering: This actually makes Gemma 4 26B-4A usable for a coding agent @ 72tk/s on my MacBook Pro M1 Max. This video is realtime,…

X AI KOLs Timeline Models

Summary

Unsloth AI announces that Gemma 4 runs 2x faster with MTP GGUFs, making it feasible for local coding agents on hardware like a MacBook Pro M1 Max at 72 tokens/s.

This actually makes Gemma 4 26B-4A usable for a coding agent @ 72tk/s on my MacBook Pro M1 Max. This video is realtime, running completely locally. https://t.co/DYAFpnseBA
Original Article
View Cached Full Text

Cached at: 06/12/26, 10:56 AM

This actually makes Gemma 4 26B-4A usable for a coding agent @ 72tk/s on my MacBook Pro M1 Max.

This video is realtime, running completely locally. https://t.co/DYAFpnseBA

Unsloth AI (@UnslothAI): Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️

MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss.

Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s.

GGUFs + Guide:

Similar Articles

Gemma4 26b MoE running in MLX with turboquant (and custom kernel)

Reddit r/LocalLLaMA

A developer successfully ran Gemma4 26b MoE on Apple MacBook Air M5 using MLX with turboquant and a custom kernel, achieving faster prompt processing and generation speeds than llama.cpp with lower memory usage. The implementation includes instructions for local deployment.

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

Reddit r/LocalLLaMA

A benchmark shows that using vLLM with DFlash speculative decoding boosts Gemma 4 26B inference to ~578 tokens per second on a single RTX 5090, achieving a 2.56x speedup over baseline.