Gemma 12b less than 10 watts 6.5pp 1.3tg

Reddit r/LocalLLaMA 06/14/26, 11:50 PM News

gemma llamacpp mobile-ai local-llm power-efficiency android on-device

Summary

Running Gemma 12B model on a Google Pixel 10 Pro using llama.cpp achieves 6.5 tokens per second prompt processing and 1.3 tokens per second generation with under 10 watts power consumption, demonstrating efficient on-device AI inference.

Google pixel 10 pro Termux Llamacpp version: 9639 (ef8268fee) $ ./llama.cpp/build\_vulkan/bin/llama-cli -m storage/downloads/gemma-4-12b-it-UD-Q3\_K\_XL.gguf --model-draft storage/downloads/mtp-gemma-4-12b-it.gguf --temp 1.0 --top-p 0.95 --top-k 64 --spec-type draft-mtp --spec-draft-n-max 1 -c 32000 --mlock -b 512 -ctk q8\_0 -ctv q8\_0 \~10,000 prompt depth \[ Prompt: 6.5 t/s | Generation: 1.3 t/s \]

Original Article

Similar Articles

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…

X AI KOLs Timeline

Google's new Gemma 4 12B is a single decoder-only transformer with encoder-free multimodal input, achieving strong benchmarks while being small enough to run locally on a budget GPU. It is released under Apache 2.0 license.

Gemma 12b less than 10 watts 6.5pp 1.3tg

Similar Articles

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

You don't need a GPU to run gemma-4-26B-A4B

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP

Submit Feedback