gemma-4

#gemma-4

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

Reddit r/LocalLLaMA ↗ · yesterday

A benchmark shows that using vLLM with DFlash speculative decoding boosts Gemma 4 26B inference to ~578 tokens per second on a single RTX 5090, achieving a 2.56x speedup over baseline.

0 favorites 0 likes

#gemma-4

The latest AI news we announced in April 2026

Google AI Blog ↗ · 4d ago Cached

Google released a roundup of major AI updates from April 2026, including the Gemma 4 model, Gemini Enterprise Agent Platform, and eighth-generation TPUs announced at Cloud Next '26.

0 favorites 0 likes

#gemma-4

z-lab/gemma-4-31B-it-DFlash

Hugging Face Models Trending ↗ · 2026-04-30 Cached

Z-lab released DFlash, a speculative decoding drafter model for Gemma-4-31B-it that uses lightweight block diffusion to draft multiple tokens in parallel, achieving up to 5.8x speedup over autoregressive baseline.

0 favorites 0 likes

#gemma-4

google/gemma-4-26B-A4B-it-assistant

Hugging Face Models Trending ↗ · 2026-04-23 Cached

Google DeepMind released Gemma 4 MTP drafters for the Gemma 4 family, enabling significant decoding speedups via speculative decoding while maintaining exact generation quality for low-latency applications.

0 favorites 0 likes

#gemma-4

google/gemma-4-31B-it-assistant

Hugging Face Models Trending ↗ · 2026-04-23 Cached

Google DeepMind releases Gemma 4, a family of open-weights multimodal models featuring Multi-Token Prediction (MTP) for up to 2x decoding speedups, supporting text, image, video, and audio with enhanced reasoning and coding capabilities.

0 favorites 0 likes

#gemma-4

@GoogleDeepMind: It gives access to 200+ of the world’s leading models through the Model Garden. This includes our latest breakthroughs:…

X AI KOLs ↗ · 2026-04-22 Cached

Google DeepMind’s Model Garden now hosts 200+ leading models, including the newly released Gemini 3.1 Pro, Gemini 3.1 Flash Image, Lyria 3, and the open Gemma 4.

0 favorites 0 likes

#gemma-4

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog ↗ · 2026-04-22 Cached

NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.

0 favorites 0 likes

#gemma-4

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared

Reddit r/LocalLLaMA ↗ · 2026-04-22

Personal benchmark shows Qwen3.5-27B Dense and Gemma4-31B Dense fix 100 % of 37 test failures, outperforming Gemma4-26B MoE even at 8-bit quantization, while using fewer tokens and less wall-clock time.

0 favorites 0 likes

#gemma-4

Youtuber tries Qwen 3.5 35B, Qwen 3.6 35B, and Gemma 4 27b to reverse engineer some large JS, with good results for Qwen 3.6

Reddit r/LocalLLaMA ↗ · 2026-04-22 Cached

Qwen 3.6 35B achieves near-perfect 283/285 line recall on a 108 k-token JS file, outperforming Gemma 4 27B (6/16 passes) and fixing long-context weaknesses of earlier Qwen versions.

0 favorites 0 likes

#gemma-4

An actual example of "If you dont run it, you dont own it" and Gemma 4 beats both Chat GPT and Gemini Chat

Reddit r/LocalLLaMA ↗ · 2026-04-21

A user documents how closed models (GPT-4o→5.3, Gemini) degraded and censored Chinese novel translations, while local Gemma 4 31B now outperforms them with natural, uncensored output.

0 favorites 0 likes

#gemma-4

@ivanfioravanti: Autoresearch from @karpathy in action locally using gemma-4-26b-a4b-it-6bit with oMLX on an M5 Max to train Gemma 4 E2B…

X AI KOLs Timeline ↗ · 2026-04-21 Cached

Developer Ivan Fioravanti demonstrates running Andrej Karpathy's autoresearch project locally with a 6-bit quantized Gemma-4-26B model on Apple Silicon, suggesting successful training of Gemma 4 E2B IT variant.

0 favorites 0 likes

#gemma-4

Did Google hide the best version of Gemma 4 e4b in Android? The extracted model beats Unsloth and everything else I've tried.

Reddit r/LocalLLaMA ↗ · 2026-04-21

A user reports that the 3.6 GB Gemma 4 e4b model extracted from Google AI Edge Gallery on Android outperforms larger 3.7 GB Unsloth versions and community ports, raising questions about hidden optimizations.

0 favorites 0 likes

#gemma-4

Gemma 4 Vision

Reddit r/LocalLLaMA ↗ · 2026-04-21

Gemma 4’s vision performance is bottlenecked by low default token budgets; raising --image-max-tokens to 2240 in llama.cpp unlocks state-of-the-art OCR and detail recognition at the cost of ~14 GB extra VRAM.

0 favorites 0 likes

#gemma-4

Gemma-4-E2B's safety filters make it unusable for emergencies

Reddit r/LocalLLaMA ↗ · 2026-04-20

A user reports that Google's Gemma-4-E2B local/offline model has overly aggressive safety filters that refuse to provide basic survival information like first aid, water purification, and emergency maintenance help, making it unsuitable for emergency preparedness scenarios where internet access is unavailable.

0 favorites 0 likes

#gemma-4

Trials and tribulations fine-tuning & deploying Gemma-4 [P]

Reddit r/MachineLearning ↗ · 2026-04-18

An ML team documents practical challenges encountered while fine-tuning and deploying Gemma-4, including incompatibilities with PEFT, SFTTrainer, DeepSpeed ZeRO-3, and lack of runtime LoRA serving support, along with workarounds for each issue.

0 favorites 0 likes

#gemma-4

Gemma 4 audio with MLX

Simon Willison's Blog ↗ · 2026-04-12 Cached

A practical guide for audio transcription on macOS using Gemma 4 E2B model with MLX and mlx-vlm, including a uv run recipe and demonstration of the workflow.

0 favorites 0 likes

#gemma-4

From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI

NVIDIA Blog ↗ · 2026-04-02 Cached

NVIDIA and Google collaborate to optimize Gemma 4 models for local deployment across RTX GPUs, DGX Spark, and Jetson devices, enabling efficient on-device agentic AI with support for reasoning, coding, multimodal capabilities, and 35+ languages.

0 favorites 0 likes

#gemma-4

Welcome Gemma 4: Frontier multimodal intelligence on device

Hugging Face Blog ↗ · 2026-04-02 Cached

Google DeepMind releases Gemma 4, a frontier multimodal model family available on Hugging Face with Apache 2 licensing, optimized for on-device deployment and supported by various inference libraries.

0 favorites 0 likes

#gemma-4

google/gemma-4-26B-A4B-it

Hugging Face Models Trending ↗ · 2026-03-11 Cached

Google DeepMind releases Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 31B parameters with support for text, image, video, and audio inputs. The models feature 256K context windows, MoE and dense architectures, enhanced reasoning capabilities, and are optimized for deployment across devices from mobile to servers.

0 favorites 0 likes

gemma-4

Submit Feedback