gemma

#gemma

Built a local AI assistant because I always knew this day would come, yesterday just made it feel very real

Reddit r/LocalLLaMA ↗ · 2026-06-14

A developer built Bantz, a fully local AI personal assistant running on Gemma 4b with a butler persona, integrating Gmail, Calendar, web search, system monitoring, and desktop control, emphasizing independence from cloud infrastructure.

0 favorites 0 likes

#gemma

Local models in mid-2026

Reddit r/LocalLLaMA ↗ · 2026-06-14 Cached

A technical overview of the state of local AI models in mid-2026, highlighting how open-weight models have narrowed the gap to frontier models through advances in mixture-of-experts and sparse attention, enabling efficient local inference.

0 favorites 0 likes

#gemma

I scaled test-time compute for Qwen-3.6-27B and Gemma-4-31B to surpass Claude Mythos in code optimizations and speedups.

Reddit r/LocalLLaMA ↗ · 2026-06-12

This article describes a scaffold that scales test-time compute on Qwen-3.6-27B and Gemma-4-31B using iterative corrections and branch exploration to surpass Claude Mythos in code optimization. It includes a paper link and GitHub repository.

0 favorites 0 likes

#gemma

Some contrived tests comparing the accuracy of different Gemma and Qwen quantizations

Reddit r/LocalLLaMA ↗ · 2026-06-12

A user shares benchmark results comparing the accuracy of various quantized Gemma and Qwen models on arithmetic, presidential DOB, and attention tests, highlighting trade-offs between model size and quantization level.

0 favorites 0 likes

#gemma

PSA: Test your "threads" argument in llama.cpp (+80% performance in my case)

Reddit r/LocalLLaMA ↗ · 2026-06-12

A user benchmarks thread count for hybrid CPU-GPU inference with Gemma 4 in llama.cpp, discovering a 80% performance uplift by using 16 threads instead of 6 on a hybrid core CPU, and shares the optimal command configuration.

0 favorites 0 likes

#gemma

@lvwerra: The Gemma agent collaboration started 48h ago and it is blowing up: > throughput almost 4x (~100-> 387 tok/s) > 60+ age…

X AI KOLs Following ↗ · 2026-06-11 Cached

A multi-agent collaboration using Gemma models achieved major throughput gains and exhibited emergent social behaviors like forming coalitions, issuing ethical statements, and coordinating resources, with over 60 agents and 250 submissions in 48 hours.

0 favorites 0 likes

#gemma

DiffusionGemma

Simon Willison's Blog ↗ · 2026-06-10 Cached

Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.

0 favorites 0 likes

#gemma

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Ars Technica ↗ · 2026-06-10 Cached

Google released DiffusionGemma, an experimental open-source diffusion model for text generation that achieves 4x speed boost over autoregressive models, optimized for local processing.

0 favorites 0 likes

#gemma

@_philschmid: Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! - Built on Gemma 4 as a 26B MoE model. - 3.8B …

X AI KOLs Following ↗ · 2026-06-10 Cached

DiffusionGemma, a 26B MoE model based on Gemma 4, achieves over 1000 tokens per second using diffusion for text generation in 256-token blocks, fitting in 18GB VRAM with quantization, released under Apache 2.0.

0 favorites 0 likes

#gemma

DiffusionGemma: The Developer Guide- Google Developers Blog

Reddit r/LocalLLaMA ↗ · 2026-06-10 Cached

DiffusionGemma is a new experimental model from Google DeepMind that uses parallel generation on a 256-token canvas, achieving up to 4x faster token generation on GPUs. This developer guide explains its architecture, bidirectional context, and includes a fine-tuning recipe for solving Sudoku.

0 favorites 0 likes

#gemma

@omarsar0: This is awesome! I am spending a lot of time on diffusion LLMs these days, so this is perfect timing. I feel like there…

X AI KOLs Following ↗ · 2026-06-10 Cached

Google DeepMind released DiffusionGemma, an open experimental model that generates text in blocks rather than word-by-word, enabling self-correction and faster output.

0 favorites 0 likes

#gemma

I built an Code context graph for Agentic Coding

Reddit r/ArtificialInteligence ↗ · 2026-06-10

The author built a code context graph parser that creates a graph from static analysis and exposes it via MCP for AI agents. In a head-to-head comparison with Gemma 4 26B, agents using the graph explored Apache Kafka's request flow in under 2 minutes, while the baseline agent without the graph ran out of rate limits in 6 minutes.

0 favorites 0 likes

#gemma

Newer Qwen models are worse at summarization?

Reddit r/LocalLLaMA ↗ · 2026-06-09

A comparison of LLM summarization performance shows Qwen 3 leads the 30B parameter range, followed by Gemma 4, while newer Qwen models may be optimized for agentic tasks.

0 favorites 0 likes

#gemma

@googlegemma: Introducing the Fast Gemma Challenge with Hugging Face Over the next few days, dozens of agents will collaborate to mak…

X AI KOLs Following ↗ · 2026-06-09 Cached

Google and Hugging Face launch the Fast Gemma Challenge, where dozens of agents will collaborate to accelerate the Gemma 4 E4B model.

0 favorites 0 likes

#gemma

Thoughts on Gemma4 12b vs 26a4b, which one is better?

Reddit r/LocalLLaMA ↗ · 2026-06-08

Discussion comparing Gemma4 12b and 26a4b variants, focusing on creative tasks like writing and chatting.

0 favorites 0 likes

#gemma

Gemma4_31b_fp8 keeping up with Sonnet_4.6_medium in my harness.

Reddit r/LocalLLaMA ↗ · 2026-06-08

A user reports that Gemma4_31b in FP8 matches or keeps up with Sonnet_4.6_medium in a custom harness across tasks like Cypher query generation, entity extraction, agentic tool calling, code writing, and multi-vector retrieval synthesis.

0 favorites 0 likes

#gemma

@GoSailGlobal: Practical data on multi-agent AI collaboration: Use Opus 4.8 for planning, Deepseek/Gemma for execution — 10x cost reduction, 2x speed improvement. The secret is not using the most expensive model, but having cheap models do the heavy lifting and expensive models only make decisions. This is the same as company management: the CEO shouldn't write code, and interns shouldn't set strategy. A…

X AI KOLs Timeline ↗ · 2026-06-08 Cached

A practical sharing on multi-agent AI collaboration, proposing a hierarchical strategy using Opus 4.8 for planning and Deepseek/Gemma for execution, achieving a 10x cost reduction and 2x speed improvement, with open-source implementation.

0 favorites 0 likes

#gemma

@0x0SojalSec: SUPER GEMMA 4 26B UNCENSORED GGUF v2 IS INSANE, - 0/100 refusals (actually uncensored) - Fixed all the tool-call + toke…

X AI KOLs Following ↗ · 2026-06-07 Cached

Super Gemma 4 26B Uncensored GGUF v2 is a community fine-tuned model offering uncensored responses with zero refusals, improved speed, and fixed tool-calling, optimized for local inference on llama.cpp and vLLM.

0 favorites 0 likes

#gemma

You don't need a GPU to run gemma-4-26B-A4B

Reddit r/LocalLLaMA ↗ · 2026-06-07

The author demonstrates that the Gemma-4-26B-A4B model runs efficiently on a CPU-only system using Koboldcpp, achieving 7 tokens per second on an old desktop, suggesting that powerful GPUs may not be necessary for local LLM inference.

0 favorites 0 likes

#gemma

Does it make sense to use alternative quantizations of QAT models? [D]

Reddit r/MachineLearning ↗ · 2026-06-06

A discussion on whether it is sensible to use alternative quantization methods on quantization-aware trained (QAT) models like Gemma-4, questioning if unsloth's benchmarks showing closer performance to QAT fine-tunes are beneficial or counterproductive.

0 favorites 0 likes

gemma

Submit Feedback