@SlimTradeyBaby: Attention all 8-12GB GPU users! This new Ornith-1.0-9B is looking like it will be a serz player for smaller VRAM setups…
Summary
Ornith-1.0-9B is a new 9B parameter AI model optimized for 8-12GB GPUs, achieving strong performance on agentic coding benchmarks, matching or surpassing models 2-3x its size.
View Cached Full Text
Cached at: 06/27/26, 09:53 AM
Attention all 8-12GB GPU users!
This new Ornith-1.0-9B is looking like it will be a serz player for smaller VRAM setups. Its punching way above its weight in agentic coding benchmarks beating or matching models 2-3x its size.
Full GGUF quants dropping in the comments ⬇️ https://t.co/N5iC6PrRv5
Similar Articles
Ornith-1.0-35B Q3_K_M: ~17 GB VRAM, KLD-checked against BF16
Ornith-1.0-35B Q3_K_M is a 3-bit quantized version of a 35B parameter model, requiring about 17 GB VRAM, with KLD checking against BF16 to ensure fidelity.
@SlimTradeyBaby: Just fired up Ornith 35B Q4 on the 5090 remotely… 2329 prompt / 195 gen tok/s and rock solid at 32k. Quick test only fu…
DeepReinforce AI releases Ornith-1.0, a self-improving open-source model family for agentic coding, including a 35B MoE variant that achieves state-of-the-art performance on coding benchmarks and runs efficiently on single GPUs like the 5090.
@anvie: Tested Ornith-1.0-9B, and its impressive for a model of that size. I don't believe this is just 9B!
Ornith-1.0 is a family of open-source LLMs specialized for agentic coding, spanning sizes from 9B to 397B and achieving state-of-the-art performance among open-source models of comparable size.
Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)
An update on the Ornith-1.0-35B GGUF model introduces a native MTP speculative-decode graft for faster inference on a single GPU, achieving ~1.3-1.35x decode speedup while maintaining near-identical token distribution. Benchmark numbers for throughput, TTFT, and long-context performance across multiple quants are provided.
@malikwas1f: Ornith-1.0-35B: a Qwen3.6-35B-A3B coding fine-tune that edges the base on real coding (aider 15/30 vs 13) — full 262K a…
Announces Ornith-1.0-35B, a coding fine-tune of Qwen3.6-35B-A3B that slightly outperforms the base model on aider benchmarks. Also promotes the club-3090 repository for running LLMs on RTX 3090s.