Ornith-1.0-35B Q3_K_M: ~17 GB VRAM, KLD-checked against BF16
Summary
Ornith-1.0-35B Q3_K_M is a 3-bit quantized version of a 35B parameter model, requiring about 17 GB VRAM, with KLD checking against BF16 to ensure fidelity.
Similar Articles
@malikwas1f: Ornith-1.0-35B: a Qwen3.6-35B-A3B coding fine-tune that edges the base on real coding (aider 15/30 vs 13) — full 262K a…
Announces Ornith-1.0-35B, a coding fine-tune of Qwen3.6-35B-A3B that slightly outperforms the base model on aider benchmarks. Also promotes the club-3090 repository for running LLMs on RTX 3090s.
@anvie: Tested Ornith-1.0-9B, and its impressive for a model of that size. I don't believe this is just 9B!
Ornith-1.0 is a family of open-source LLMs specialized for agentic coding, spanning sizes from 9B to 397B and achieving state-of-the-art performance among open-source models of comparable size.
@SlimTradeyBaby: Attention all 8-12GB GPU users! This new Ornith-1.0-9B is looking like it will be a serz player for smaller VRAM setups…
Ornith-1.0-9B is a new 9B parameter AI model optimized for 8-12GB GPUs, achieving strong performance on agentic coding benchmarks, matching or surpassing models 2-3x its size.
Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)
An update on the Ornith-1.0-35B GGUF model introduces a native MTP speculative-decode graft for faster inference on a single GPU, achieving ~1.3-1.35x decode speedup while maintaining near-identical token distribution. Benchmark numbers for throughput, TTFT, and long-context performance across multiple quants are provided.
Ornith-1.0 released on Hugging Face
Ornith-1.0 has been released on Hugging Face, featuring a collection of models ranging from 9B to 397B parameters, including dense and MoE architectures, claiming state-of-the-art performance on various benchmarks.