@witcheer: can’t believe gpt-oss-20b perfs on 8GB vRAM 21B total params, 3.6B active (MoE). OpenAI, Apache 2.0. uses only 1.8 GB V…
Summary
A new open-source MoE model, gpt-oss-20b (21B total, 3.6B active), runs on only 1.8GB VRAM and achieves perfect scores on agentic coding tasks, outperforming other local models like Gemma and Qwen.
View Cached Full Text
Cached at: 05/25/26, 02:40 AM
can’t believe gpt-oss-20b perfs on 8GB vRAM
21B total params, 3.6B active (MoE). OpenAI, Apache 2.0.
uses only 1.8 GB VRAM with expert offload. on an 8 GB card, that’s nothing.
I ran it through 10 agentic coding tasks (port scanner, log watcher, TDD, data pipeline, multi-module builds). result: 10/10 PASS. 7 self-fixes. zero hallucinated APIs.
no other local model I tested completed both benchmark tasks. not Gemma. not Qwen. not OmniCoder.
1.8 GB VRAM for the best agentic model on consumer hardware.
Similar Articles
@eliebakouch: very nice release by @OpenAI! a 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion sc…
OpenAI released a 1.5B-parameter MoE model with only 50M active parameters that can filter private data from trillion-token datasets while maintaining 128k context length.
@hank_aibtc: Family, local LLMs are incredibly impressive! I stumbled upon this gpt-oss-20b-tq3 on Hugging Face, and it's truly captivating! OpenAI's official open-source 20B+ parameter MoE model, optimized by the community using TurboQuant 3-bit quantization + MLX...
The article highlights the gpt-oss-20b-tq3 model, a quantized version of an OpenAI MoE model that runs efficiently on standard 16GB MacBook Airs using TurboQuant and MLX optimizations.
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)
A developer demonstrates running MoE models like Qwen 3.6 35B-A3B and Gemma 4 26B-A4B at 24+ tok/s on an old GTX 1080 (8GB VRAM) with 128k context using llama.cpp with MoE offloading and TurboQuant KV cache quantization, revealing optimization tricks for Gemma's MTP speculative decoding.
Introducing gpt-oss
OpenAI releases gpt-oss-120b and gpt-oss-20b, two state-of-the-art open-weight language models under Apache 2.0 license that achieve near-parity with proprietary models while being optimizable for consumer hardware and edge devices. Both models demonstrate strong reasoning and tool-use capabilities with comprehensive safety evaluations.
gpt-oss-120b & gpt-oss-20b Model Card
OpenAI releases gpt-oss-120b and gpt-oss-20b, open-weight reasoning models under Apache 2.0 license designed for agentic workflows with strong instruction following, tool use, and chain-of-thought capabilities. The release includes comprehensive safety evaluations confirming the models do not reach high capability thresholds for biological, chemical, or cyber risks even under adversarial fine-tuning.