gemma-4

Tag

Cards List
#gemma-4

@analogalok: I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtestin…

X AI KOLs Following · 1h ago Cached

A developer demonstrates running Gemma 4 26B MoE model locally on an 8GB RTX 4060 with Hermes agent to fully automate backtesting of trading strategies, highlighting the growing capability of local LLMs as autonomous agents.

0 favorites 0 likes
#gemma-4

Is there any reason for a lack of love for Gemma 4 26b?

Reddit r/LocalLLaMA · 10h ago

A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.

0 favorites 0 likes
#gemma-4

Is Gemma 4 going to be the next Mistral (or Qwen3.6) one day? Concerning the lack of finetunes

Reddit r/LocalLLaMA · 18h ago

An analysis exploring why Gemma 4, despite advantages like QAT and vision support, lacks community finetunes compared to Mistral, and whether community inertia will eventually shift.

0 favorites 0 likes
#gemma-4

Qt Creator 20 and local AI

Reddit r/LocalLLaMA · 21h ago Cached

Qt Creator 20 now supports local AI coding assistants via the Agent Client Protocol, enabling integration with open-weight models like GPT-OSS and Gemma 4 running on consumer hardware.

0 favorites 0 likes
#gemma-4

Gemma 4 QAT 31B responds better to KV cache quantization too

Reddit r/LocalLLaMA · yesterday

The Gemma 4 QAT 31B model demonstrates improved behavior with KV cache quantization, suggesting enhanced inference efficiency.

0 favorites 0 likes
#gemma-4

Gemma 4 31B Q6 on Dual 9060 XT

Reddit r/LocalLLaMA · yesterday

Discusses running a Q6 quantized version of the Gemma 4 31B model on a dual 9060 XT GPU configuration, likely for local inference.

0 favorites 0 likes
#gemma-4

@analogalok: gemma-4-12B-agentic-fable5-composer2.5 V2 is out. the agentic upgrade to the model trained on Fable 5's reasoning. Runn…

X AI KOLs Timeline · 2d ago Cached

A new fine-tuned version of Gemma 4 12B, trained on Fable 5's reasoning, achieves a significant jump in agentic coding benchmarks (from 15% to 55%) and can run locally on an 8GB VRAM GPU using a custom fork of llama.cpp.

0 favorites 0 likes
#gemma-4

Gemma 4 26b a4b is genuinely the best model I have tried for language learning and scientific queries!

Reddit r/LocalLLaMA · 2d ago

User reports that Gemma 4 26b outperforms Qwen 3.5/3.6 for language learning and scientific queries, despite being behind in coding tasks, and invites discussion on other non-coding use cases for small MoE models.

0 favorites 0 likes
#gemma-4

I wrote a free 15-part series on LLM internals — real math, real tensor shapes, real hardware constraints. All grounded in Gemma 4 12B's actual config.

Reddit r/LocalLLaMA · 2d ago

A comprehensive 15-part series covering LLM internals from tokenization to serving, grounded in Gemma 4 12B's actual config.

0 favorites 0 likes
#gemma-4

yuxinlu1/gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUF

Hugging Face Models Trending · 4d ago Cached

A fine-tuned version of Gemma-4-12B, optimized for local coding and agentic tasks, achieving ~3.5x improvement over the base model on the tau2-bench telecom benchmark.

0 favorites 0 likes
#gemma-4

@analogalok: Gemma 4 12B QAT (dense) achieves 1000+ tokens/sec prefill on 8GB VRAM with 120k context Gemma 4 12B QAT (dense), TurboQ…

X AI KOLs Following · 5d ago Cached

Gemma 4 12B QAT (dense) achieves over 1000 tokens per second prefill on an 8GB RTX 4060 with 120k context using TurboQuant, enabling full GPU layer offloading. This represents a 42% increase in prefill speed over previous methods.

0 favorites 0 likes
#gemma-4

@MiaAI_lab: I fine-tuned Gemma 4 12B with Fable-5 style reasoning and assistant traces and released it as Gemmable 4 12b. **Availab…

X AI KOLs Timeline · 5d ago Cached

Mia-AiLab released Gemmable 4 12B, a fine-tuned version of Google's Gemma 4 12B model using Fable-5 style reasoning and assistant traces, available in GGUF and MLX formats for local inference.

0 favorites 0 likes
#gemma-4

@andimarafioti: Can a VLM see without a vision encoder? We trained one for $100, inspired by Gemma 4 12B. Latency on an M3 Pro MacBook:…

X AI KOLs Timeline · 5d ago Cached

Researchers trained a vision-language model without a vision encoder for only $100, inspired by Gemma 4 12B, achieving a 30% reduction in end-to-end latency on an M3 Pro MacBook.

0 favorites 0 likes
#gemma-4

@onusoz: 16x parallel Gemma-4-26B-A4B-NVFP4 runs 18 output tokens/s, aggregate 300 tok/s 🫪 1 DGX Spark with 128 GB unified memo…

X AI KOLs Timeline · 5d ago Cached

@onusoz demonstrates running 16 parallel instances of NVIDIA's quantized Gemma-4-26B-A4B-NVFP4 model on a single DGX Spark with 128GB unified memory, achieving 300 tok/s aggregate, showcasing high concurrency without flashinfer.

0 favorites 0 likes
#gemma-4

@QingQ77: Use Gemma 4 locally to automatically analyze screenshots, build a searchable, conversational AI memory bank, 100% local, zero cloud dependency, an open-source privacy alternative to Microsoft Recall. https://github.com/ayushh0110/Scre…

X AI KOLs Timeline · 5d ago Cached

ScreenMind is an open-source tool that uses Gemma 4 to analyze screenshots locally, building a searchable and conversational AI memory bank as a privacy alternative to Microsoft Recall.

0 favorites 0 likes
#gemma-4

@googledevs: Autonomous AI in action. Check out how the new Gemma 4 31B model operates as an ADK Agent, exploring, planning, and run…

X AI KOLs Following · 5d ago Cached

Google DeepMind released the Gemma 4 series of open-weight models, covering four sizes from 2B to 31B, supporting 128K–256K context, reasoning, and function calling, under Apache 2.0 license, and equipped with ADK framework for autonomous agent capabilities.

0 favorites 0 likes
#gemma-4

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Reddit r/LocalLLaMA · 5d ago

Gemma 4 is demonstrated running in-browser via WebGPU at 255 tokens per second, using kernels generated by Fable 5, showcasing efficient on-device inference.

0 favorites 0 likes
#gemma-4

@_philschmid: "But with the most recent releases from Google in the Gemma 4, family, I’ve finally been able to do agentic coding loca…

X AI KOLs Following · 6d ago Cached

Phil Schmid highlights that Google's Gemma 4 models enable local agentic coding with about 75% the accuracy/speed of frontier models, referencing a write-up by Vicki Boykis.

0 favorites 0 likes
#gemma-4

@analogalok: my 8 GB VRAM gaming laptop is absolutely going to hate me for this. but I still did it. ran a 31b dense model (Gemma 4 …

X AI KOLs Timeline · 6d ago Cached

User runs Gemma 4 31B dense model on 8GB VRAM gaming laptop at ~3 tokens/sec using llama.cpp with MTP speculative decoding, demonstrating feasibility of running a 31B dense model on consumer hardware and proposing agentic workflows where a fast MoE model routes to this slower dense model for hard tasks.

0 favorites 0 likes
#gemma-4

@googlegemma: Gemma 4 E2B goes super fast on Intel AI PCs thanks to LiteRT NPU support on OpenVINO! 1.3x faster prefill performance o…

X AI KOLs Timeline · 6d ago Cached

Gemma 4 E2B achieves 1.3x faster prefill and 2.8x better performance-per-watt on Intel AI PCs using OpenVINO with LiteRT NPU support, enabling efficient background LLM tasks.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback