Ornith-1.0-35B Q3_K_M: ~17 GB VRAM, KLD-checked against BF16

Reddit r/LocalLLaMA 06/27/26, 02:30 AM Models

Summary

Ornith-1.0-35B Q3_K_M is a 3-bit quantized version of a 35B parameter model, requiring about 17 GB VRAM, with KLD checking against BF16 to ensure fidelity.

No content available

Original Article

Similar Articles

@malikwas1f: Ornith-1.0-35B: a Qwen3.6-35B-A3B coding fine-tune that edges the base on real coding (aider 15/30 vs 13) — full 262K a…

X AI KOLs Timeline

Announces Ornith-1.0-35B, a coding fine-tune of Qwen3.6-35B-A3B that slightly outperforms the base model on aider benchmarks. Also promotes the club-3090 repository for running LLMs on RTX 3090s.

@anvie: Tested Ornith-1.0-9B, and its impressive for a model of that size. I don't believe this is just 9B!

X AI KOLs Following

Ornith-1.0 is a family of open-source LLMs specialized for agentic coding, spanning sizes from 9B to 397B and achieving state-of-the-art performance among open-source models of comparable size.

@SlimTradeyBaby: Attention all 8-12GB GPU users! This new Ornith-1.0-9B is looking like it will be a serz player for smaller VRAM setups…

X AI KOLs Timeline

Ornith-1.0-9B is a new 9B parameter AI model optimized for 8-12GB GPUs, achieving strong performance on agentic coding benchmarks, matching or surpassing models 2-3x its size.

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

Reddit r/LocalLLaMA

An update on the Ornith-1.0-35B GGUF model introduces a native MTP speculative-decode graft for faster inference on a single GPU, achieving ~1.3-1.35x decode speedup while maintaining near-identical token distribution. Benchmark numbers for throughput, TTFT, and long-context performance across multiple quants are provided.

Ornith-1.0 released on Hugging Face