@no_stp_on_snek: one last thing: the real downside i found testing Ornith-1.0 (the new agentic coder): it over-gates legitimate work. on…
Summary
A tester reports that the new Ornith-1.0 agentic coder model over-gates legitimate work by demanding excessive prerequisites, a trade-off from its cautious training, while stock Qwen3.6 executes simple tasks directly.
View Cached Full Text
Cached at: 06/28/26, 03:55 AM
one last thing:
the real downside i found testing Ornith-1.0 (the new agentic coder): it over-gates legitimate work.
on simple, fully-disclosed requests it would stall, demanding access or prerequisites instead of just doing the thing or delegating it. stock Qwen3.6 just executed.
textbook agentic-RL artifact: trained to gather context and build scaffolding before acting, it over-applies that to tasks that should just get done. the same caution that makes it refuse a poisoned premise makes it over-ask on easy stuff. tradeoffs.
i have no doubts that some of the quirks can be fixed through more training, just end up playing whakamole sometimes. for a v1 pretty good.
Similar Articles
@no_stp_on_snek: a new 35B coder dropped (Ornith-1.0) and a promo blog says it "crushes" the benchmarks. my first instinct was benchmaxx…
A new 35B coding model, Ornith-1.0, is compared against Qwen3.6-35B on custom tests. The user finds Ornith-1.0 to be genuinely stronger for long-horizon agentic coding, resisting bad context and finishing large tasks, but it is more cautious and verbose, sometimes over-gating simple requests.
@SixZzshOtRipZz: I can advocate for this I ran a similar test to see if Ornith would cave on decision making, even attempting to trick i…
The tweet describes a test where Ornith-1.0 resisted a false premise about using Redis, highlighting its honesty in autonomous coding. The linked Hugging Face page announces Ornith-1.0, a family of open-source coding agent models with state-of-the-art benchmarks.
@SlimTradeyBaby: Just read @no_stp_on_snek review of the new Ornith-1.0 35B coder easily one of the best model write-ups I've seen in a …
A review of the new Ornith-1.0 35B coding model that bypasses public benchmarks and tests it on real agentic tasks, highlighting its strengths in long-horizon coding and coherence, as well as costs like verbosity.
@TeksEdge: Been testing Orinth-1.0-35B to see how it stacks up with Qwen3.6-35B over a day's use. Anecdotally, it works as well as…
A user reports that Ornith-1.0-35B matches Qwen3.6-35B in performance but excels at planning and long task execution, while the developer announces the open-source Ornith-1.0 family of LLMs specialized for agentic coding.
@sudoingX: running Ornith on the dgx spark to see what it actually is. it's a new agentic coding model from @ornith_ / deepreinfor…
Ornith-1.0 is a new family of open-source agentic coding models from deepreinforce-ai, trained with reinforcement learning that jointly optimizes both the solution and the scaffolding. The 35B MoE version achieves state-of-the-art on coding benchmarks and supports efficient single-GPU deployment.