@Freerunnering: This actually makes Gemma 4 26B-4A usable for a coding agent @ 72tk/s on my MacBook Pro M1 Max. This video is realtime,…

X AI KOLs Timeline 06/12/26, 03:30 AM Models

gemma-4 coding-agent local-inference macbook mpt gguf unsloth

Summary

Unsloth AI announces that Gemma 4 runs 2x faster with MTP GGUFs, making it feasible for local coding agents on hardware like a MacBook Pro M1 Max at 72 tokens/s.

This actually makes Gemma 4 26B-4A usable for a coding agent @ 72tk/s on my MacBook Pro M1 Max. This video is realtime, running completely locally. https://t.co/DYAFpnseBA

Original Article

View Cached Full Text

Cached at: 06/12/26, 10:56 AM

This actually makes Gemma 4 26B-4A usable for a coding agent @ 72tk/s on my MacBook Pro M1 Max.

This video is realtime, running completely locally. https://t.co/DYAFpnseBA

Unsloth AI (@UnslothAI): Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️

MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss.

Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s.

GGUFs + Guide:

Similar Articles

@rohanpaul_ai: atomic[.]chat just made Gemma 4 26B faster inside LLaMA.cpp. making token generation about 40% faster in its MacBook Pr…

X AI KOLs Following

atomic.chat has optimized Gemma 4 26B inference in LLaMA.cpp, achieving ~40% faster token generation on MacBook Pro M5 Max using Multi-Token Prediction (MTP) speculative decoding. This is a notable win for local AI users running desktop apps, coding agents, and private on-device assistants.

@leopardracer: GEMMA 4 26B ON AN RTX 4060 WITH A 248K TOKEN CONTEXT WINDOW 20 tokens per second and a context window so large you can …

X AI KOLs Timeline

Gemma 4 26B runs on an RTX 4060 with 248K token context at 20 tokens per second using llama.cpp and Q4_K_XL quantization, enabling local processing of entire codebases on consumer hardware.

Gemma4 26b MoE running in MLX with turboquant (and custom kernel)

Reddit r/LocalLLaMA

A developer successfully ran Gemma4 26b MoE on Apple MacBook Air M5 using MLX with turboquant and a custom kernel, achieving faster prompt processing and generation speeds than llama.cpp with lower memory usage. The implementation includes instructions for local deployment.

@analogalok: Run Gemma 4 26B MoE on 8GB VRAM with 250k context at 20+ tokens/sec If you own any 8GB VRAM graphics card, stop what yo…

X AI KOLs Timeline

Alok demonstrates running Gemma 4 26B MoE on 8GB VRAM using Unsloth's QAT quant and the -cmoe flag in llama.cpp, achieving 20 tokens/sec with 250k context, marking a major milestone for budget local AI.

Gemma 4 26B Hits 600 Tok/s on One RTX 5090