@witcheer: can’t believe gpt-oss-20b perfs on 8GB vRAM 21B total params, 3.6B active (MoE). OpenAI, Apache 2.0. uses only 1.8 GB V…

X AI KOLs Timeline 05/24/26, 04:44 PM Models

open-source mixture-of-experts coding agentic consumer-hardware vram-efficiency

Summary

A new open-source MoE model, gpt-oss-20b (21B total, 3.6B active), runs on only 1.8GB VRAM and achieves perfect scores on agentic coding tasks, outperforming other local models like Gemma and Qwen.

can’t believe gpt-oss-20b perfs on 8GB vRAM 21B total params, 3.6B active (MoE). OpenAI, Apache 2.0. uses only 1.8 GB VRAM with expert offload. on an 8 GB card, that’s nothing. I ran it through 10 agentic coding tasks (port scanner, log watcher, TDD, data pipeline, multi-module builds). result: 10/10 PASS. 7 self-fixes. zero hallucinated APIs. no other local model I tested completed both benchmark tasks. not Gemma. not Qwen. not OmniCoder. 1.8 GB VRAM for the best agentic model on consumer hardware.

Original Article

View Cached Full Text

Cached at: 05/25/26, 02:40 AM

can’t believe gpt-oss-20b perfs on 8GB vRAM

21B total params, 3.6B active (MoE). OpenAI, Apache 2.0.

uses only 1.8 GB VRAM with expert offload. on an 8 GB card, that’s nothing.

I ran it through 10 agentic coding tasks (port scanner, log watcher, TDD, data pipeline, multi-module builds). result: 10/10 PASS. 7 self-fixes. zero hallucinated APIs.

no other local model I tested completed both benchmark tasks. not Gemma. not Qwen. not OmniCoder.

1.8 GB VRAM for the best agentic model on consumer hardware.

Similar Articles

@eliebakouch: very nice release by @OpenAI! a 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion sc…

X AI KOLs Following

OpenAI released a 1.5B-parameter MoE model with only 50M active parameters that can filter private data from trillion-token datasets while maintaining 128k context length.

@hank_aibtc: Family, local LLMs are incredibly impressive! I stumbled upon this gpt-oss-20b-tq3 on Hugging Face, and it's truly captivating! OpenAI's official open-source 20B+ parameter MoE model, optimized by the community using TurboQuant 3-bit quantization + MLX...

X AI KOLs Timeline

The article highlights the gpt-oss-20b-tq3 model, a quantized version of an OpenAI MoE model that runs efficiently on standard 16GB MacBook Airs using TurboQuant and MLX optimizations.

24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

Reddit r/LocalLLaMA

A developer demonstrates running MoE models like Qwen 3.6 35B-A3B and Gemma 4 26B-A4B at 24+ tok/s on an old GTX 1080 (8GB VRAM) with 128k context using llama.cpp with MoE offloading and TurboQuant KV cache quantization, revealing optimization tricks for Gemma's MTP speculative decoding.

Introducing gpt-oss

OpenAI Blog

OpenAI releases gpt-oss-120b and gpt-oss-20b, two state-of-the-art open-weight language models under Apache 2.0 license that achieve near-parity with proprietary models while being optimizable for consumer hardware and edge devices. Both models demonstrate strong reasoning and tool-use capabilities with comprehensive safety evaluations.

gpt-oss-120b & gpt-oss-20b Model Card