@sudoingX: 在dgx spark上运行Ornith，看看它到底是什么。这是一个来自@ornith_ / deepreinfor... 的新代理式编码模型。

X AI KOLs Timeline 2026/06/27 16:49 模型

agentic-coding moe reinforcement-learning open-source coding-agent self-improving

摘要

Ornith-1.0是来自deepreinforce-ai的新一代开源代理式编码模型系列，采用强化学习训练，同时优化解决方案和脚手架。其35B MoE版本在编码基准测试中达到了最先进水平，并支持高效的单一GPU部署。

在dgx spark上运行Ornith，看看它到底是什么。这是一个来自@ornith_ / deepreinforce-ai的新代理式编码模型，采用35B MoE架构（A3B，每token约3B激活）。下载了Q4_K_M gguf（约20GB），将其接入hermes agent，在单个spark上达到约78 tok/s的速度，并具备快速预填充，因此它运行起来就像一个真正的代理。真正有趣的部分是其训练方式。大多数编码强化学习只优化最终代码。而Ornith的强化学习同时优化了脚手架（SCAFFOLD），即驱动解决方案的任务特定结构，以及解决方案本身。因此它不仅学习写代码，还学习如何解决问题、制定计划、搭建框架和结构——这就是代理式模型的关键赌注。这就是运行在hermes agent上的内容。现在让我们看看训练是否真的能迁移到实际任务中，而不仅仅是基准测试。模型：http://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF…

查看原文

查看缓存全文

缓存时间: 2026/06/28 03:59

在 DGX Spark 上运行 Ornith，看看它到底是什么。这是来自 @ornith_ / deepreinforce-ai 的新代理式编码模型，35B MoE（A3B，每个 token 约激活 3B）。拉取 Q4_K_M gguf（约 20GB），接入 Hermes 代理，在单台 Spark 上达到约 78 tok/s，预填快速，因此用起来像个真正的代理。真正有趣的部分是它的训练方式。大多数编码强化学习仅优化最终代码。而 Ornith 的强化学习同时优化了“脚手架“——即驱动解决方案的任务特定结构——以及解决方案本身。因此，它不仅学习编写代码，还学习如何解决问题、制定计划、搭建框架、组织结构。这就是代理式赌注。这就是在 Hermes 代理上运行的内容。现在让我们看看这种训练是否能真正转化为实际任务中的表现，而非仅仅在基准测试上。模型：http://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF…

deepreinforce-ai/Ornith-1.0-35B-GGUF · Hugging Face

来源：https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF

Ornith 博客 (https://deep-reinforce.com/ornith.html)

Aloha! 🌺 今天，我们发布 Ornith-1.0，一个面向代理式编程的自我改进型开源模型系列。

亮点：

最先进的编码代理：提供 9B-Dense、31B-Dense、35B-MoE 和 397B-MoE（基于 Gemma 4 和 Qwen 3.5 后训练）版本，在 Terminal-Bench 2.1、SWE-Bench、NL2Repo 和 OpenClaw 等编码基准测试中，达到了同级开源模型中的最先进性能。
自我改进的训练框架：Ornith-1.0 采用强化学习，不仅学习生成解决方案的 rollout，还学习生成驱动这些 rollout 的“脚手架“（scaffold）。通过联合优化脚手架及其生成的解决方案，模型能发现更好的搜索轨迹，并产生更高质量的解决方案。
许可证：MIT 许可，全球可访问，无地域限制。

Ornith 35B 基准测试结果

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#ornith-10-35b

Ornith 1.0 35B

本文档记录 Ornith-1.0-35B，这是 Ornith 系列的轻量级成员，专为单 GPU 高效部署而设计。

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#benchmarks

基准测试

	Ornith-1.0-35B	Qwen3.5-35B	Qwen3.6-35B	Gemma4-31B	Qwen3.5-397B
代理式编码
Terminal-Bench 2.1 (Terminus-2)	64.2	41.4	52.5	42.1	53.5
Terminal-Bench 2.1 (Claude Code)	62.8	38.9	49.2	-	48.6
SWE-bench Verified	75.6	70	73.4	52	76.4
SWE-bench Pro	50.4	44.6	49.5	35.7	51.6
SWE-bench Multilingual	69.3	60.3	67.2	51.7	69.3
NL2Repo	34.6	20.5	29.4	15.5	36.8
Claw-eval Avg	69.8	65.4	68.7	48.5	70.7
SWE Atlas - QnA	37.1	13.2	15.5	-	20.4
SWE Atlas - RF	29.7	10.2	11.4	-	18.4
SWE Atlas - TW	27.8	9.8	13.3	-	18.5

* Terminal-Bench 2.1 (Terminus-2)：我们使用 Harbor/Terminus-2 框架进行评估，参数为 parser=json, temperature=1.0, top_p=1.0，上下文窗口 128K。每次运行使用 4 小时超时，32 个 CPU 核心和 48GB RAM，结果取 5 次平均值。我们调整了 Qwen 聊天模板以确保训练与推理的一致性 (https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B/blob/main/chat_template.jinja)，并修改 Harbor 以匹配 vLLM 的 reasoning_content key。

* Terminal-Bench 2.1 (Claude Code)：使用 Claude Code 2.1.126 评估，参数为 parser=json, temperature=1.0, top_p=1.0, max_new_tokens=131072。结果取 5 次平均值。同样需要修改 Qwen 聊天模板。

* SWE-Bench Verified、Pro 和 Multilingual：使用 OpenHands harness，参数为 temp=1.0, top_p=0.95，上下文窗口 256K。

* SWE Atlas QnA、RF、TW：使用 mini SWE agent harness，参数为 temp=1.0, top_p=0.95，上下文窗口 128K。结果取 5 次平均值。

* NL2Repo：temperature=1.0, top_p=1.0，上下文 400K，输出 48K，并应用反黑客过滤器。

* ClawEval：一个基于真实用户任务分布的代理式代码基准测试；temp=0.6，上下文 256K。

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#quickstart

快速开始

📝 注意 Ornith-1.0-35B 是一个推理模型：默认情况下，助手的回复会以 ... 块开始，然后是最终答案。下面的服务配置启用推理解析器，使思维链以单独的 reasoning_content 字段返回；同时启用工具调用解析器，使模型的 `` 块以 OpenAI 风格的 tool_calls 形式呈现。

服务 Ornith-1.0-35B 需要较新的运行时：

Transformers ≥ 5.8.1
vLLM ≥ 0.19.1
SGLang ≥ 0.5.9

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#serving-ornith-10-35b

服务 Ornith-1.0-35B

下面的两个方法会在单个 8×80GB GPU 节点（tensor-parallel 8）上启动一个 OpenAI 兼容的服务器。根据你的 GPU 数量调整 --tensor-parallel-size / --tp。

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#vllm

vLLM

vllm serve deepreinforce-ai/Ornith-1.0-35B \
    --served-model-name Ornith-1.0-35B \
    --tensor-parallel-size 8 \
    --host 0.0.0.0 --port 8000 \
    --max-model-len 262144 \
    --gpu-memory-utilization 0.90 \
    --enable-prefix-caching \
    --enable-auto-tool-choice --tool-call-parser qwen3_xml \
    --reasoning-parser qwen3 \
    --trust-remote-code

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#sglang

SGLang

python -m sglang.launch_server \
    --model-path deepreinforce-ai/Ornith-1.0-35B \
    --served-model-name Ornith-1.0-35B \
    --tp 8 \
    --host 0.0.0.0 --port 8000 \
    --context-length 262144 \
    --mem-fraction-static 0.85 \
    --tool-call-parser qwen3_coder \
    --reasoning-parser qwen3

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#hugging-face-transformers

Hugging Face Transformers

如需快速本地测试（或编写离线生成脚本），可直接使用 Transformers 加载模型。请确保安装了较新版本——参见 Transformers 安装指南 (https://huggingface.co/docs/transformers/installation)；Ornith-1.0-35B 需要 transformers >= 5.8.1。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepreinforce-ai/Ornith-1.0-35B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Write a Python function is_prime(n). Keep it short."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
generated = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
output_ids = generated[0][inputs.input_ids.shape[1]:]
# 回复包含一个 ... 推理块，后跟答案。
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

要拆分推理过程和最终答案，可按 `` 标记进行解析：

text = tokenizer.decode(output_ids, skip_special_tokens=True)
if "" in text:
    reasoning, answer = text.split("", 1)
    reasoning = reasoning.replace("", "").strip()
    answer = answer.strip()
else:
    reasoning, answer = "", text.strip()

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#using-ornith-10-35b-via-the-chat-completions-api

通过 Chat Completions API 使用 Ornith-1.0-35B

一旦 vLLM 或 SGLang 服务器运行起来，就可以使用任何 OpenAI 兼容的客户端与之通信。

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#basic-usage

基本用法

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",  # 本地服务器任意非空字符串均可
)
response = client.chat.completions.create(
    model="Ornith-1.0-35B",
    messages=[
        {"role": "user", "content": "Write a one-line Python lambda that squares a number."}
    ],
    temperature=0.6,
    top_p=0.95,
    max_tokens=1024,
)
message = response.choices[0].message
# reasoning_content 保存推理过程；content 保存最终答案。
print("reasoning:", getattr(message, "reasoning_content", None))
print("answer:", message.content)

你也可以流式获取 token，或给模型提供工具——Ornith-1.0-35B 能生成格式良好的函数调用，服务器会将其解析为标准 tool_calls 字段：

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]
response = client.chat.completions.create(
    model="Ornith-1.0-35B",
    messages=[{"role": "user", "content": "What is the weather in Paris right now?"}],
    tools=tools,
    tool_choice="auto",
    temperature=0.6,
    max_tokens=2048,
)
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)
# -> get_weather {"city": "Paris"}

你可以将任何 OpenAI 兼容的 SDK（Python、Node.js 等）或 curl 指向同一个 /v1/chat/completions 端点。

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#agentic-usage

代理式用法

Ornith-1.0-35B 在工具调用和代理式编码方面表现出色。

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#agent-frameworks

代理框架

由于 Ornith-1.0-35B 暴露了支持工具调用的 OpenAI 兼容端点，因此它可以开箱即用地与标准代理框架配合使用。下面是一个最小示例，通过 MCP 服务器将 Ornith-1.0-35B 连接到工具。

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1"),
    api_key=os.getenv("OPENAI_API_KEY", "EMPTY"),
)
tools = [
    {
        "type": "function",
        "function": {
            "name": "run_shell",
            "description": "Run a shell command and return its output.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {"type": "string", "description": "The command to run"}
                },
                "required": ["command"],
            },
        },
    }
]
messages = [{"role": "user", "content": "List the Python files in the current directory."}]
response = client.chat.completions.create(
    model="deepreinforce-ai/Ornith-1.0-35B",
    messages=messages,
    tools=tools,
    temperature=0.6,
    top_p=0.95,
)
print(response.choices[0].message)

将 Ornith 与代理 harness 结合使用的例子：

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#hermes-agent

Hermes Agent

# Hermes 可与任何 OpenAI 兼容端点通信——将其指向你的 Ornith 服务器。
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="deepreinforce-ai/Ornith-1.0-35B"

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#atomicchat-ollama–llamacpp

Atomic.chat / Ollama / llama.cpp

# 这两个运行时都加载 Ornith 的 GGUF 版本（发布在 deepreinforce-ai/Ornith-1.0-35B-GGUF）。
# llama.cpp — 在端口 8000 上提供 OpenAI 兼容 API。
llama-server -hf deepreinforce-ai/Ornith-1.0-35B-GGUF --port 8000 -c 262144

# Ollama — 直接从 Hugging Face 拉取同一 GGUF 并与之聊天。
ollama run hf.co/deepreinforce-ai/Ornith-1.0-35B-GGUF

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#openclaw

OpenClaw

# OpenClaw 可与任何 OpenAI 兼容端点通信——将其指向你的 Ornith 服务器。
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export OPENAI_MODEL="deepreinforce-ai/Ornith-1.0-35B"

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#unsloth-studio

Unsloth Studio

pip install unsloth
# 加载 Ornith 用于快速本地推理或微调（Python）：
# from unsloth import FastLanguageModel
# model, tokenizer = FastLanguageModel.from_pretrained(
#     "deepreinforce-ai/Ornith-1.0-35B",
#     max_seq_length=262144,
#     load_in_4bit=True,
# )

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#openhands

OpenHands

pip install openhands-ai
# OpenHands 通过 LiteLLM 路由；"openai/" 前缀选择 OpenAI 兼容路径。
export LLM_MODEL="openai/deepreinforce-ai/Ornith-1.0-35B"
export LLM_BASE_URL="http://localhost:8000/v1"
export LLM_API_KEY="EMPTY"

# 启动 CLI（或使用相同环境变量运行官方 OpenHands Docker 镜像）。
openhands

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#coding-clis

编码 CLI

Ornith-1.0-35B 针对终端型编码代理进行了优化。将任何 OpenAI 兼容的编码 CLI 指向你的 Ornith-1.0-35B 端点（设置 OPENAI_BASE_URL 和 OPENAI_API_KEY），即可理解大型代码库、自动执行繁琐工作并加快交付速度。

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#opencode

OpenCode

# 在 ~/.config/opencode/opencode.json 中将本地 Ornith 端点注册为 provider：
#
# {
#   "$schema": "https://opencode.ai/config.json",
#   "provider": {
#     "ornith": {
#       "npm": "@ai-sdk/openai-compatible",
#       "name": "Ornith (local)",
#       "options": { "baseURL": "http://localhost:8000/v1", "apiKey": "EMPTY" },
#       "models": { "deepreinforce-ai/Ornith-1.0-35B": { "name": "Ornith-1.0-35B" } }
#     }
#   }
# }
opencode

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF#citation

引用

如果你觉得我们的工作有帮助，欢迎引用我们。

@misc{ornith-35b,
  title = {{Ornith-1.0-35B}: Agentic Coding, Open to All},
  url = {https://deep-reinforce.com/ornith_1_0.html},
  author = {{DeepReinforce Team}},
  year = {2026}
}