pcie-offloading

#pcie-offloading

24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

Reddit r/LocalLLaMA ↗ · 2026-05-13

A developer demonstrates running MoE models like Qwen 3.6 35B-A3B and Gemma 4 26B-A4B at 24+ tok/s on an old GTX 1080 (8GB VRAM) with 128k context using llama.cpp with MoE offloading and TurboQuant KV cache quantization, revealing optimization tricks for Gemma's MTP speculative decoding.

0 favorites 0 likes

pcie-offloading

24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

Submit Feedback