@no_stp_on_snek: one last thing: the real downside i found testing Ornith-1.0 (the new agentic coder): it over-gates legitimate work. on…

X AI KOLs Following 06/27/26, 04:40 PM Models

agentic-coder ornith-1.0 model-evaluation qwen benchmark coding-agent

Summary

A tester reports that the new Ornith-1.0 agentic coder model over-gates legitimate work by demanding excessive prerequisites, a trade-off from its cautious training, while stock Qwen3.6 executes simple tasks directly.

one last thing: the real downside i found testing Ornith-1.0 (the new agentic coder): it over-gates legitimate work. on simple, fully-disclosed requests it would stall, demanding access or prerequisites instead of just doing the thing or delegating it. stock Qwen3.6 just executed. textbook agentic-RL artifact: trained to gather context and build scaffolding before acting, it over-applies that to tasks that should just get done. the same caution that makes it refuse a poisoned premise makes it over-ask on easy stuff. tradeoffs.

Original Article

View Cached Full Text

Cached at: 06/28/26, 03:55 AM

one last thing:

the real downside i found testing Ornith-1.0 (the new agentic coder): it over-gates legitimate work.

on simple, fully-disclosed requests it would stall, demanding access or prerequisites instead of just doing the thing or delegating it. stock Qwen3.6 just executed.

textbook agentic-RL artifact: trained to gather context and build scaffolding before acting, it over-applies that to tasks that should just get done. the same caution that makes it refuse a poisoned premise makes it over-ask on easy stuff. tradeoffs.

i have no doubts that some of the quirks can be fixed through more training, just end up playing whakamole sometimes. for a v1 pretty good.

Similar Articles

@no_stp_on_snek: a new 35B coder dropped (Ornith-1.0) and a promo blog says it "crushes" the benchmarks. my first instinct was benchmaxx…

X AI KOLs Timeline

A new 35B coding model, Ornith-1.0, is compared against Qwen3.6-35B on custom tests. The user finds Ornith-1.0 to be genuinely stronger for long-horizon agentic coding, resisting bad context and finishing large tasks, but it is more cautious and verbose, sometimes over-gating simple requests.

@SixZzshOtRipZz: I can advocate for this I ran a similar test to see if Ornith would cave on decision making, even attempting to trick i…

X AI KOLs Timeline

The tweet describes a test where Ornith-1.0 resisted a false premise about using Redis, highlighting its honesty in autonomous coding. The linked Hugging Face page announces Ornith-1.0, a family of open-source coding agent models with state-of-the-art benchmarks.

@SlimTradeyBaby: Just read @no_stp_on_snek review of the new Ornith-1.0 35B coder easily one of the best model write-ups I've seen in a …

X AI KOLs Following

A review of the new Ornith-1.0 35B coding model that bypasses public benchmarks and tests it on real agentic tasks, highlighting its strengths in long-horizon coding and coherence, as well as costs like verbosity.

@TeksEdge: Been testing Orinth-1.0-35B to see how it stacks up with Qwen3.6-35B over a day's use. Anecdotally, it works as well as…

X AI KOLs Timeline

A user reports that Ornith-1.0-35B matches Qwen3.6-35B in performance but excels at planning and long task execution, while the developer announces the open-source Ornith-1.0 family of LLMs specialized for agentic coding.

@sudoingX: running Ornith on the dgx spark to see what it actually is. it's a new agentic coding model from @ornith_ / deepreinfor…

X AI KOLs Timeline

Ornith-1.0 is a new family of open-source agentic coding models from deepreinforce-ai, trained with reinforcement learning that jointly optimizes both the solution and the scaffolding. The 35B MoE version achieves state-of-the-art on coding benchmarks and supports efficient single-GPU deployment.

Similar Articles

@no_stp_on_snek: a new 35B coder dropped (Ornith-1.0) and a promo blog says it "crushes" the benchmarks. my first instinct was benchmaxx…

@SixZzshOtRipZz: I can advocate for this I ran a similar test to see if Ornith would cave on decision making, even attempting to trick i…

@SlimTradeyBaby: Just read @no_stp_on_snek review of the new Ornith-1.0 35B coder easily one of the best model write-ups I've seen in a …

@TeksEdge: Been testing Orinth-1.0-35B to see how it stacks up with Qwen3.6-35B over a day's use. Anecdotally, it works as well as…

@sudoingX: running Ornith on the dgx spark to see what it actually is. it's a new agentic coding model from @ornith_ / deepreinfor…

Submit Feedback