@witcheer: can’t believe gpt-oss-20b perfs on 8GB vRAM 21B total params, 3.6B active (MoE). OpenAI, Apache 2.0. uses only 1.8 GB V…

X AI KOLs Timeline Models

Summary

A new open-source MoE model, gpt-oss-20b (21B total, 3.6B active), runs on only 1.8GB VRAM and achieves perfect scores on agentic coding tasks, outperforming other local models like Gemma and Qwen.

can’t believe gpt-oss-20b perfs on 8GB vRAM 21B total params, 3.6B active (MoE). OpenAI, Apache 2.0. uses only 1.8 GB VRAM with expert offload. on an 8 GB card, that’s nothing. I ran it through 10 agentic coding tasks (port scanner, log watcher, TDD, data pipeline, multi-module builds). result: 10/10 PASS. 7 self-fixes. zero hallucinated APIs. no other local model I tested completed both benchmark tasks. not Gemma. not Qwen. not OmniCoder. 1.8 GB VRAM for the best agentic model on consumer hardware.
Original Article
View Cached Full Text

Cached at: 05/25/26, 02:40 AM

can’t believe gpt-oss-20b perfs on 8GB vRAM

21B total params, 3.6B active (MoE). OpenAI, Apache 2.0.

uses only 1.8 GB VRAM with expert offload. on an 8 GB card, that’s nothing.

I ran it through 10 agentic coding tasks (port scanner, log watcher, TDD, data pipeline, multi-module builds). result: 10/10 PASS. 7 self-fixes. zero hallucinated APIs.

no other local model I tested completed both benchmark tasks. not Gemma. not Qwen. not OmniCoder.

1.8 GB VRAM for the best agentic model on consumer hardware.

Similar Articles

24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

Reddit r/LocalLLaMA

A developer demonstrates running MoE models like Qwen 3.6 35B-A3B and Gemma 4 26B-A4B at 24+ tok/s on an old GTX 1080 (8GB VRAM) with 128k context using llama.cpp with MoE offloading and TurboQuant KV cache quantization, revealing optimization tricks for Gemma's MTP speculative decoding.

Introducing gpt-oss

OpenAI Blog

OpenAI releases gpt-oss-120b and gpt-oss-20b, two state-of-the-art open-weight language models under Apache 2.0 license that achieve near-parity with proprietary models while being optimizable for consumer hardware and edge devices. Both models demonstrate strong reasoning and tool-use capabilities with comprehensive safety evaluations.

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI Blog

OpenAI releases gpt-oss-120b and gpt-oss-20b, open-weight reasoning models under Apache 2.0 license designed for agentic workflows with strong instruction following, tool use, and chain-of-thought capabilities. The release includes comprehensive safety evaluations confirming the models do not reach high capability thresholds for biological, chemical, or cyber risks even under adversarial fine-tuning.