@KyleHessling1: Hello again, everyone! We've got another really fun 9b, this one specifically trained for tool calling and agentic codi…
Summary
A new 9B fine-tuned model called Qwopus3.5-9B-Coder is released, optimized for tool calling and agentic coding workflows, achieving strong SWE-bench and HermesAgent-20 scores while running on affordable hardware.
View Cached Full Text
Cached at: 05/17/26, 05:28 AM
Hello again, everyone!
We’ve got another really fun 9b, this one specifically trained for tool calling and agentic coding workflows in @NousResearch Hermes agent.
Happy to report that it crushes, and as a 9b it runs on super affordable hardware. We also hit this one with some coding domain-specific training, and it scored a 53.33% on SWE bench on a slice of 200 samples!
To me, I was really shocked to see this high of a score on a 9B model in swe, correct me if I’m wrong, but I think that’s nipping at the heels of the Gemma 4 series, much larger models on this particular benchmark, which is really incredible to see!
It also crushes the HermesAgent-20 benchmark, scoring an 85 vs the base model’s 71!
Make sure to run it hot, –temp around 1, that seems to be the sweet spot for running these particular fine tunes in harnesses. If you have trouble, you can work your way down, but it does a much better job departing from base models, overthinking when you run it, high temp ~1.
Please spin it up in Hermes and let us know your thoughts! Looking forward to hearing your feedback as always!
Also, those of you waiting for Qwopus 3.6 27B, I have put together a preliminary evaluation for you in my HF repo, go check it out; we will be releasing the full model very soon! I will put the preliminary repo in the comments!
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face
Source: https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%8C%9F-qwopus35-9b-coder🌟 Qwopus3.5-9B-coder
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%9A%80-model-fine-tuning-and-logical-alignment-qwopus35-9b-coder🚀 Model Fine-Tuning and Logical Alignment (Qwopus3.5-9B-coder)
As the base model of this model,Qwopus3.5-9B-v3.5is already a model with powerful capabilities. On this foundation,Qwopus3.5-9B-coderis specially optimized and fine-tuned for high-performance**🤖 Agentic Coding, complex Tool Calling, and logical reasoning.**
💡**Why the 9B Dense Model?We believe that the 9B dense architecture represents the perfect“sweet spot”**for large language models. It runs seamlessly at 8-bit precision on entry-level 16GB RAM devices—such as standard laptops and the Mac mini—making it exceptionally lightweight yet highly versatile. Without requiring expensive hardware, it allows you to achieve excellent performance paired with impressive inference speeds. Simply put,Qwen3.5-9B is currently the best open-source model in its class.
Vision & Tool Calling Support: This model supports visual capabilities and tool calling. To enable vision, please place the
mmproj\.gguffile from theGGUF repositoryinto the same directory as the main\.gguffile.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%9B%A0-training-strategy🛠 Training Strategy
The fine-tuning process of this model deeply integratesTrace Inversiondata augmentation technology with high-qualityAgent Traces. This systematic approach not only strengthens the model’s ability to solve complex programming tasks, but also greatly improves its logical coherence and accuracy when using various tools.
This model is designed specifically for the following goals:
- 🧩 More structured and stronger logical reasoning capabilities, reducing repetitive thinking
- 💻 More powerful capabilities in code writing, debugging, and repository-level task processing
- 🛠 More stable and accurate Tool Calling capabilities for terminal commands, file operations, and browsers
- 🔁 Better cross-data source distillation alignment
- Community Release Notice: Qwopus3.5-9B-coder is released purely as an experimental community version, aiming to explore the combination of Agent capabilities and deep reasoning, and is only for research and exploration use. - Warning: Because this model is vertically fine-tuned for programming agents and deep reasoning, and has not undergone comprehensive general performance evaluation, its capabilities in general domains or specific non-programming tasks may suffer from Capability Decay. Users are advised to be aware of its limitations in other scenarios while exploring its core capabilities.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%93%8A-baseline-performance-comparison📊 Baseline Performance Comparison
To verify the execution efficiency and logical robustness ofQwopus3.5-9B-coderin actual agent scenarios, we adopted the open-source testing frameworkbenchlocal.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#test-configurationTest Configuration
- Hardware Environment: Apple Silicon (Mac)
- Inference Backend: LM Studio / MLX / GGUF
- Testing Platform:benchlocal- An evaluation suite focusing on local model agent capabilities.
- 🍎 You can see the actual inference speeds of different model formats on the same device.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%A7%AA-benchmark-results🧪 Benchmark Results
1. Complex Agent Performance - HermesAgent-20
The following is the comparative performance under the HermesAgent-20 task set:
HermesAgent-20 Performance MetricsModelTest SetComprehensive ScoreCore Dimensions (M/O/S/S/B)**Qwopus3.5-9B-coder**HermesAgent-208584 / 93 / 88 / 75 / 84Qwen/Qwen3.5-9BHermesAgent-207175 / 58 / 100 / 53 / 69armand0e/Qwen3.5-9B-AgentHermesAgent-206871 / 83 / 43 / 61 / 80DJLougen/Harmonic-Hermes-9BHermesAgent-204760 / 45 / 23 / 69 / 382. Tool Call Stability - ToolCall-15
This is a ToolCall-15 test set targeting the stability of tool calls, aiming to test the stability of the model in tool calling:
ToolCall-15 Stability MetricsModelTest SetComprehensive ScoreDimension Scores (A/B/C/D/E)**Qwopus3.5-9B-coder**ToolCall-15100100 / 100 / 100 / 100 / 100Qwen/Qwen3.5-9BToolCall-15100100 / 100 / 100 / 100 / 100armand0e/Qwen3.5-9B-AgentToolCall-1593100 / 100 / 100 / 67 / 1003. Code Debugging & Bug Fixing - BugFind-15
BugFind-15 is a test set containing 15 scenarios from shallow to deep, aiming to evaluate the real debugging capabilities of the model in discovering and fixing syntax, logical errors, and “trap” code in multiple programming languages through deterministic environment runtime verification.
BugFind-15 Performance MetricsModelTest SetComprehensive ScoreDimension Scores (A/B/C/D/E)**Qwopus3.5-9B-coder**BugFind-157967 / 87 / 100 / 77 / 43Jackrong/MLX-Qwen3.5-9B-DeepSeek-V4-FlashBugFind-157567 / 100 / 67 / 57 / 80armand0e/Qwen3.5-9B-AgentBugFind-155829 / 87 / 73 / 20 / 67### https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%AA%90-swe-bench-verified-performance-repository-level-coding-capability🪐 SWE-bench Verified Performance (Repository-level Coding Capability)
The following shows the comparative performance onSWE-bench Verified, which evaluates language models on resolving software engineering issues in real-world open-source repositories:
SWE-bench Verified Performance MetricsModelTest SetComprehensive Score (%)Claude 4.5 OpusSWE-bench Verified80.9Qwen/Qwen3.5-27BSWE-bench Verified75.0Qwen/Qwen3.6-35B-A3BSWE-bench Verified73.4**Qwopus3.5-9B-coderSWE-bench Verified53.33google/gemma-4-31B-itSWE-bench Verified52.0google/gemma-4-26B-A4BSWE-bench Verified45.0 - 48.0> - ⚙️ All tests were conducted with a temperature of 1 as officially recommended by qwen3.5. All errors and model issues were attempted to be regenerated twice after a test failure. If both attempts fail, it is considered a failure. - 🍎 All screenshots of the test interfaces have been uploaded to the image folder in the repository. Click the link below to view and verify: - 🔗View Test Screenshots - ❤️Kyle Hessling**for his generous hardware and equipment support. You can follow him for more updates on X / Twitter:@KyleHessling1.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%A7%AA-core-dataset-usage-trace-inversion-and-high-quality-agent-traces🧪 Core Dataset Usage: Trace Inversion and High-Quality Agent Traces
In order to break through the “reasoning bubble” limitation of the model in actual programming and tool usage, and to endow it with real Agent behavioral capabilities, this model introduced core augmented datasets during training:
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#1-reasoning-synthetic-data-combining-trace-inversion1. Reasoning Synthetic Data Combining Trace Inversion
Currently, based on public information, commercial models such as OpenAI’s GPT series and Anthropic’s Claude series have very clearly hidden the true internal reasoning chains of their models. For these models, what we can ultimately see in the API or front-end interface can often only be considered a highly compressed “Reasoning Bubble”.
To break through this limitation, we adopted theTrace Inversiontechnology. This technology utilizes an external “surrogate model” to reconstruct a complete and logically coherent deep reasoning chain based on the “question + final answer + compressed reasoning summary” published by commercial models. The “reasoning bubble”, which originally consisted of only a few sentences and logical leaps, is expanded into a high-quality deep learning trace with complete derivation, calculation, and logical verification, providing step-by-step logical learning signals for the model.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#2-glm-51-agent-real-trace-data-lambdahermes-agent-reasoning-traces2. GLM-5.1 Agent Real Trace Data: lambda/hermes-agent-reasoning-traces
To significantly enhance the model’s execution and coding capabilities in real environments, this model additionally introduced the**lambda/hermes\-agent\-reasoning\-traces**dataset.
- Data Source and Scale: This data subset contains approximately 10,000 high-quality multi-turn Tool Calling Trajectories generated based on the ZhipuAI GLM-5.1 and kimi-4.6 models.
- Real Agent Behavior: Unlike traditional synthetic data, these samples represent real Agent conversations. Each sample not only contains the step-by-step reasoning process in the
<think\>tags, but also includes actual tool execution results (rather than fabricated outputs out of thin air). - Extensive Domain Coverage:- Terminal & Coding: Script writing, code debugging, environment configuration, and data processing. - Repository Tasks: Involving real code repository work, such as bug fixes, refactoring, and code review. - Browser Automation: Web navigation, scraping, and form filling. - Agent Tools: Memory persistence, task delegation, skill management, etc.
By learning these Agent trajectories that contain real feedback and thoughtful processes, Qwopus3.5-9B-coder can exhibit thinking and operational modes closer to human experts when facing complex programming and system operations tasks.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%97%BA%EF%B8%8F-training-pipeline-overview🗺️ Training Pipeline Overview
The training of this model integrates a phased learning pipeline ofTrace Inversiondata augmentation technology andhigh-quality Agent Trajectories data. Its core logic lies in restoring the highly compressed “reasoning bubble” of commercial models into a deep path for learning, and combining it with real agent operational traces to comprehensively improve the model’s logical reasoning and code execution capabilities.
[ 🗺️ Trace Inversion: Full Process of Data Inversion and "Attack" Distillation ]
A. Surrogate Model Training
Open Source Model (GLM-5.1 / DS-V4) ──► Complete Reasoning Chain ──► [ Qwen3-235B Compression ] ──► Reasoning Bubbles
│ │
└──────────► [ Training ] ◄─────────┘
(Base: Qwen3-4B-Instruct)
(Result: Trace-Inverter-4B)
B. Inversion Phase: "Attacking" Claude-4.7-Max
_______________________________________________________
| |
| Claude-4.7-Max API ──► Compressed Bubbles + Final Answer |
|_______________________________________________________|
│
▼
[ 🧠 Trace-Inverter-4B (Logical Reconstructor) ] ────► Synthetic CoT
│
▼
[ 🧩 Data Splicing ] ◄────────── (Original Prompt + Response)
(Embed the inverted chain of thought into <think> tags, and splice with the original Q&A pair for restoration)
│
▼
(Result: claude-opus-4.6/4.7 Inversion Set)
C. Final SFT Pipeline
___________________________________________
| |
| Base Model (Qwopus3.5-9B-v3.5) |
|___________________________________________|
│
▼
[ 📦 Stage 1: Format Establishment and Logic Injection ] ───────► [ 🛠️ Stage 2: Agent Trajectories and Programming Reinforcement ]
(Integrate inverted reasoning data, stabilize thinking format) (Introduce GLM-5.1 Agent Trajectories, reinforce interaction and execution)
│ │
│ ▼
│ __________________________________________________
│ | 🔍 Hermes Agent Trace Sample Structure Breakdown (GLM-5.1) |
│ | 1. [🛠️ System] -> JSON Tool Definition |
│ | 2. [👤 Human] -> Initial Task Instruction |
│ | ┌──────────────────────────────────────────────┐ |
│ | │ 🔁 Multi-turn Loop: │ |
│ | │ 3. [🧠 GPT] -> <think> Logical Reasoning/Reflection │ |
│ | │ 4. [🤖 GPT] -> Tool Call Execution Action │ |
│ | │ 5. [⚙️ Tool] -> Real Feedback │ |
│ | └──────────────────────────────────────────────┘ |
│ |__________________________________________________|
│ │
└────────────────┬────────────────┘
▼
___________________________________
| |
| 🌟 Final Model: Qwopus3.5-9B-coder |
|___________________________________|
Because agent trajectory datasets are complex and diverse. The datasets have undergone rigorous cleaning and formatting.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%8E%AF-three-stage-curriculum-learning🎯 Three-Stage Curriculum Learning
Qwopus3.5-9B-coderadopts a phased reasoning data mixture strategy similar to Curriculum Learning, gradually increasing the difficulty and complexity of training signals:
- **Early Stage (Format Establishment):**Focuses on short-to-medium length reasoning samples with stable formats. The primary goal of this stage is to establish a reliable, structured new reasoning format while avoiding overwhelming the model with extreme complexity.
- **Middle Stage (Complexity Scaling & Multi-Teacher Distillation):**Gradually increases the proportion of complex reasoning samples from multiple teacher models. - The distillation data is sourced from more powerful models whose style distribution closely matches the base model, ensuring that the capability gap is not too wide, thereby achieving efficient learning.
- Late Stage (Long-Context Reinforcement & Drift Prevention):Reinforces reasoning capabilities in long contexts. Crucially, this stage retainsshort-sample replayto ensure the model maintains its short-context instruction-following capability and minimizes capability drift.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%A4%9D-collaboration–training-details🤝 Collaboration & Training Details
This model is the result of continuous exploration in Agentic AI and reasoning capabilities.
Training Infrastructure & Configuration:
- 🖥️**Hardware:**Local compute devices / Cloud GPUs (e.g. GB10 / H100 / RTX 5090 / A100)
- ⚙️**Framework:**Unsloth for efficient fine-tuning
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%E2%9A%A0%EF%B8%8F-important⚠️ IMPORTANT
Compatibility and Deployment Notice - Tool Calling Format: When using this model for tool calling, please ensure that you use a Prompt format and System Prompt that match the training data to activate its Agent capabilities. - Reasoning Output Extraction: The model’s thinking process is typically wrapped in
<think\>and</think\>tags. When deploying to front-end applications, these tags may need to be parsed and hidden.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%93%9A-resources–guides📚 Resources & Guides
👉**GitHub Repository: Jackrong-llm-finetuning-guide**Visit the repository to dive into our fine-tuning codebase and guides.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%99%8F-acknowledgements🙏 Acknowledgements
Special thanks to:
- The Qwen team for the strong Qwen3.6 MoE base model.
- Unsloth for efficient fine-tuning frameworks.
- Open-source datasets and community contributors.
- Kyle Hesslingfor his generous hardware and equipment support. You can follow him for more updates on X / Twitter:@KyleHessling1.
https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF#%F0%9F%93%96-citation📖 Citation
@misc{jackrong_qwopus35_9b_coder,
title = {Qwopus3.5-9B-coder},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face}
}
Similar Articles
Jackrong/Qwopus3.5-9B-Coder-MTP-GGUF
Jackrong releases Qwopus3.5-9B-Coder-MTP-GGUF, a Qwen-based 9B coding model fine-tuned with Multi-Token Prediction (MTP) architecture, achieving 35.8% throughput improvement and 8.3% accuracy gain over the base model, with perfect scores on coding and math benchmarks.
Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!
Qwen3.6-35B-A3B and Qwen3.5-9B models are officially on the Terminal-Bench 2.0 leaderboard, with little-coder achieving 24.6% on the 35B variant, surpassing Gemini 2.5 Pro and Qwen3-Coder-480B, while the 9B model shows that sub-10B local models can compete on hard agentic benchmarks.
@outsource_: BREAKING QWOPUS 3.6 27B IS FULLY LIVE! SOTA QWEN 3.6 27b + Opus IS HERE!!!! Agentic coding GOATED: 75.25% (152/202) on …
Qwopus 3.6 27B is now fully live, a merged model (Qwen + Opus) achieving state-of-the-art agentic coding performance with 75.25% on SWE MMLU Pro, handling 303k token context at Q8 KV cache, and running on 24GB VRAM at Q5_K_M quantization.
@mr_r0b0t: It’s no secret I believe specialist small models are part of a well run local agent team. The one below is definitely g…
A new small AI model, Qwopus 3.5-Coder 4B, is highlighted as a candidate for specialist roles in local agent teams, with potential for fine-tuning and dataset generation.
Qwen3.6-35B becomes competitive with cloud models when paired with the right agent
By pairing Qwen3.6-35B with the little-coder agent scaffold, the model hits 78.7% on the Polyglot coding benchmark, placing in the public top 10 and rivaling cloud models.


