gguf-quantization

#gguf-quantization

500k context on 48gb VRAM!! - 21tok/s (coding)

Reddit r/LocalLLaMA ↗ · 2026-05-11

A user reports successful deployment of a quantized Nemotron-3 Super model supporting 500k context and agentic coding on consumer-grade dual Titan RTX hardware.

0 favorites 0 likes

#gguf-quantization

hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

Hugging Face Models Trending ↗ · 2026-04-18 Cached

A 35B-parameter Qwen3.6 model fine-tuned with Claude-Opus-style chain-of-thought distillation data and released in GGUF quantized formats for efficient local inference.

0 favorites 0 likes

gguf-quantization

500k context on 48gb VRAM!! - 21tok/s (coding)

hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

Submit Feedback