Gemma 4 12B first coding agent test on a 4080 Super

Reddit r/LocalLLaMA Tools

Summary

A user tested Gemma 4 12B as a coding agent in VSCodium using Pi Agent extension, successfully performing a task to create a Python script that reads logs and outputs JSON. The model handled tool use autonomously with zero bugs.

Just threw the new Gemma 4 12B into VSCodium with the Pi Agent extension to see how it handles tools, and it nailed the test on the first try. I gave it a prompt to write a Python script that reads logs line-by-line, grabs the error modules, and dumps the counts to a JSON file. I also told it to make its own mock log data and run a live terminal test to verify the results. Instead of just spitting out a block of code for me to copy and paste, the agent actually went to work. It created the script, populated a dummy app.log file with a mix of random logs, opened up a terminal shell to run the code, and verified the output with zero bugs or path errors. * **Model:** Gemma 4 12B (Unsloth UD-Q4\_K\_XL) * **Context:** 32K (`--ctx-size 32768`) * **KV Cache:** 8-bit (`--cache-type-k q8_0 --cache-type-v q8_0`) * **Layers:** \-1 (Full offload to GPU) * **Samplers:** Flash Attention ON, `--temp 1.0`, `--top-p 0.95`, `--top-k 64`, `--min-p 0.05`, `--repeat-penalty 1.15` * `llama.cpp + cuda`
Original Article

Similar Articles

Gemma 4 12B is my new main squeeze

Reddit r/LocalLLaMA

The author shares their experience switching from Qwen 3.6 to Gemma 4 12B (Unsloth Q5_K_XL) for local coding, praising its plug-and-play setup, better syntax accuracy, and manageable VRAM usage despite a slight speed trade-off.

google/gemma-4-26B-A4B-it-assistant

Hugging Face Models Trending

Google DeepMind released Gemma 4 MTP drafters for the Gemma 4 family, enabling significant decoding speedups via speculative decoding while maintaining exact generation quality for low-latency applications.

google/gemma-4-E4B-it-assistant

Hugging Face Models Trending

Google DeepMind releases the Gemma 4 E4B instruction-tuned assistant model, featuring multimodal capabilities, reasoning improvements, and optimized speculative decoding for low-latency on-device applications.

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog

NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.