@SlimTradeyBaby: Just read @no_stp_on_snek review of the new Ornith-1.0 35B coder easily one of the best model write-ups I've seen in a …
Summary
A review of the new Ornith-1.0 35B coding model that bypasses public benchmarks and tests it on real agentic tasks, highlighting its strengths in long-horizon coding and coherence, as well as costs like verbosity.
View Cached Full Text
Cached at: 06/26/26, 04:13 PM
Just read @no_stp_on_snek review of the new Ornith-1.0 35B coder easily one of the best model write-ups I’ve seen in a long time. He cuts straight through the promo hype, skips the easily gamed public benchmarks, and instead runs real h2h tests on held out agentic tasks. The result is a clear, honest breakdown: Ornith’s genuine strengths in long horizon coding and coherence, plus its real costs (more cautious, more verbose). No hype, no copium just practical receipts on where it wins and where it trades off. Exactly the kind of grounded analysis this space needs. Great work, Tom. Give the man a follow always great content and solid git!
Tom Turney (@no_stp_on_snek): a new 35B coder dropped (Ornith-1.0) and a promo blog says it “crushes” the benchmarks. my first instinct was benchmaxx, public test sets like SWE-Bench and Terminal-Bench are easy to overfit. so i ignored the benchmarks and ran it head-to-head against stock Qwen3.6-35B on my own
Similar Articles
@no_stp_on_snek: a new 35B coder dropped (Ornith-1.0) and a promo blog says it "crushes" the benchmarks. my first instinct was benchmaxx…
A new 35B coding model, Ornith-1.0, is compared against Qwen3.6-35B on custom tests. The user finds Ornith-1.0 to be genuinely stronger for long-horizon agentic coding, resisting bad context and finishing large tasks, but it is more cautious and verbose, sometimes over-gating simple requests.
@SlimTradeyBaby: Just fired up Ornith 35B Q4 on the 5090 remotely… 2329 prompt / 195 gen tok/s and rock solid at 32k. Quick test only fu…
DeepReinforce AI releases Ornith-1.0, a self-improving open-source model family for agentic coding, including a 35B MoE variant that achieves state-of-the-art performance on coding benchmarks and runs efficiently on single GPUs like the 5090.
@no_stp_on_snek: one last thing: the real downside i found testing Ornith-1.0 (the new agentic coder): it over-gates legitimate work. on…
A tester reports that the new Ornith-1.0 agentic coder model over-gates legitimate work by demanding excessive prerequisites, a trade-off from its cautious training, while stock Qwen3.6 executes simple tasks directly.
@no_stp_on_snek: verdict up front: it's a "pass" in my book in certain categories, just a narrower one than the 35B. you're buying real …
The author evaluates Ornith-9B against its base Qwen3.5-9B, finding that RL post-training improves token efficiency and sustained coding coherence but sacrifices single-turn judgment and robustness to misleading inputs, making it a narrower upgrade at 9B compared to the 35B version.
@SixZzshOtRipZz: I can advocate for this I ran a similar test to see if Ornith would cave on decision making, even attempting to trick i…
The tweet describes a test where Ornith-1.0 resisted a false premise about using Redis, highlighting its honesty in autonomous coding. The linked Hugging Face page announces Ornith-1.0, a family of open-source coding agent models with state-of-the-art benchmarks.