@KyleHessling1: Good morning y'all! Qwopus-3.6-35B-A3B-MTP-Coder is live! All GGUF's will be populating over the next few hours! It's a…

X AI KOLs Timeline Models

Summary

Qwopus-3.6-35B-A3B-MTP-Coder is a new open-source MoE fine-tune optimized for coding agent workflows with thinking disabled, offering fast token-efficient inference and competitive performance against similar models.

Good morning y'all! Qwopus-3.6-35B-A3B-MTP-Coder is live! All GGUF's will be populating over the next few hours! It's a lightning-fast MOE with the coder curriculum recipe. Similar to the 27B coder, it shines with thinking disabled, offering significantly faster wall time for similar, and in some cases superior results to same-sized thinking alternatives! With thinking disabled, it goes toe-to-toe with the new Ornith 35B MoE across a huge eval suite (performed by @no_stp_on_snek), edging it on the coding trajectories and decisively on speed and cost, even though Ornith was run with thinking enabled. See the model card for the full test results, and shoutout to Tom, @no_stp_on_snek, for thoroughly evaluating the model for us before launch! With MTP and thinking disabled, along with the MOE speed, it runs so quickly in harnesses like @opencode that it almost feels instant @ 253 tps on my 5090. No 8k tokens of thinking before a coherent output is actioned. This is especially useful in long contexts, where the base models will progressively start thinking for tens of thousands of tokens before replying. Compared to the base models with thinking off, the coder curriculum really advances the no-think frontier. Especially in terms of how creative it can be. Run temp hot as usual, 0.85-1, and make sure your harness isn't overriding the temp setting of your server at runtime. If you want to use it to its full ability, I would recommend giving it very thorough prompts. I have been using it in opencode, and I have been blown away by the results it generates autonomously with chunky prompts. Please see links to the demo's Aether Dominion (RTS Game), and a slide deck presentation the model made about itself that turned out beautifully, links in comments below! I am getting results on this incredibly fast local model (with thinking disabled) that I couldn't get in some thinking frontier models over a year ago. Open source is accelerating fast, and in light of recent events, there's never been a better time to get your local AI workflows tightened up. This MOE would be a great one to play with, and it's also a great one if you don't have much VRAM because it can run fast offloaded partially to system memory! All of that said, please give it a run with thinking off and build something you'd like to see. We'd love to see your results and any feedback on specific use cases in the comments below! Also, thanks so much for 5k followers, you all make up such an enjoyable and knowledgeable open source community, and I am so blessed to be able to collaborate and discuss this research with all of you. I can't express how grateful I am for every comment. As always, I will try to reply to them all! If we ever get monetized on X, I will put every penny into buying more hardware for our lab! Have a blessed day, my friends, looking forward to your thoughts!
Original Article
View Cached Full Text

Cached at: 06/29/26, 10:32 PM

Good morning y’all!

Qwopus-3.6-35B-A3B-MTP-Coder is live! All GGUF’s will be populating over the next few hours!

It’s a lightning-fast MOE with the coder curriculum recipe. Similar to the 27B coder, it shines with thinking disabled, offering significantly faster wall time for similar, and in some cases superior results to same-sized thinking alternatives! With thinking disabled, it goes toe-to-toe with the new Ornith 35B MoE across a huge eval suite (performed by @no_stp_on_snek), edging it on the coding trajectories and decisively on speed and cost, even though Ornith was run with thinking enabled.

See the model card for the full test results, and shoutout to Tom, @no_stp_on_snek, for thoroughly evaluating the model for us before launch!

With MTP and thinking disabled, along with the MOE speed, it runs so quickly in harnesses like @opencode that it almost feels instant @ 253 tps on my 5090.

No 8k tokens of thinking before a coherent output is actioned. This is especially useful in long contexts, where the base models will progressively start thinking for tens of thousands of tokens before replying.

Compared to the base models with thinking off, the coder curriculum really advances the no-think frontier. Especially in terms of how creative it can be. Run temp hot as usual, 0.85-1, and make sure your harness isn’t overriding the temp setting of your server at runtime.

If you want to use it to its full ability, I would recommend giving it very thorough prompts. I have been using it in opencode, and I have been blown away by the results it generates autonomously with chunky prompts. Please see links to the demo’s Aether Dominion (RTS Game), and a slide deck presentation the model made about itself that turned out beautifully, links in comments below!

I am getting results on this incredibly fast local model (with thinking disabled) that I couldn’t get in some thinking frontier models over a year ago.

Open source is accelerating fast, and in light of recent events, there’s never been a better time to get your local AI workflows tightened up. This MOE would be a great one to play with, and it’s also a great one if you don’t have much VRAM because it can run fast offloaded partially to system memory!

All of that said, please give it a run with thinking off and build something you’d like to see. We’d love to see your results and any feedback on specific use cases in the comments below!

Also, thanks so much for 5k followers, you all make up such an enjoyable and knowledgeable open source community, and I am so blessed to be able to collaborate and discuss this research with all of you. I can’t express how grateful I am for every comment. As always, I will try to reply to them all!

If we ever get monetized on X, I will put every penny into buying more hardware for our lab!

Have a blessed day, my friends, looking forward to your thoughts!


Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF · Hugging Face

Source: https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF

⚙️ Qwopus-3.6-35B-A3B-Coder

Agentic Coder Release

A thinking-off, token-efficient coding agent model built on Qwopus3.6-35B-A3B-v1 / Qwen3.6-35B-A3B.

🧠 Thinking-Off Agent⚡ Token-Efficient Coding🛠️ Tool Calling & Workflow🧩 35B-A3B MoE🎮 Game Demo Ready

💡What is Qwopus-3.6-35B-A3B-Coder?

🪐Qwopus-3.6-35B-A3B-Coderis a practical coding-agent fine-tune focused onexecution efficiency, not simply longer visible reasoning. It is designed for real agentic coding workflows where the model repeatedly reads files, chooses tools, edits code, runs tests, reacts to errors, and summarizes work. The core goal is to complete more of these steps withless token waste, lower latency, and more stable behaviorwhen explicit long thinking is disabled.

⚡ Fast Agent LoopsOptimized for repeated tool decisions, patching, test runs, and error-driven debugging without forcing every step into long thinking mode.

🧩 MoE EfficiencyBuilt from a 35B total / 3B active-parameter MoE foundation for high-throughput local coding workflows.

🛠️ Agent Harness FitAims to fit Codex-style, OpenHands-style, Claude Code-style, and OpenCode-style agent harnesses.

🎮 Live Coding DemoIncludes a slot for an RTS/game-building sample generated through an agent workflow.

Community Release Notice: Qwopus-3.6-35B-A3B-Coder is an experimental community model intended for research, local coding-agent evaluation, and workflow exploration. It has not undergone complete safety evaluation or broad general-domain benchmarking.

Evaluation Mode: The central design target and comparison framing in this card isthinking-offexecution. The model is evaluated for whether it can remain useful and stable without relying on long visible reasoning traces at every step.


https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF#%F0%9F%8E%AF-1-fine-tuning-objective-less-overthinking-more-execution🎯 1. Fine-Tuning Objective: Less Overthinking, More Execution

🧭1.1 Why This Model Exists

The goal of this fine-tune isnotto chase longer reasoning chains for their own sake. In a real coding agent workflow, many steps are operational rather than deeply philosophical: read a file, inspect a stack trace, choose the next tool, edit code, run tests, check the error, continue, and report the result.

If every one of these steps enters a long thinking mode, the workflow can pay unnecessary costs: more tokens, higher latency, noisier state transitions, and greater risk of long-horizon behavioral drift. Qwopus-3.6-35B-A3B-Coder is tuned around a different product assumption:

Let the model do more agent work with fewer tokens, faster turns, and steadier tool behavior.

⚡1.2 Core Optimization Target

1. Faster next-step decisions Identify whether to inspect, edit, test, or summarize without excessive deliberation.

2. Lower token waste Reduce unnecessary long-form reasoning in routine implementation steps.

3. Better workflow stability Keep multi-turn code tasks on track across file edits, tool calls, and retries.

4. Local deployment fit Make high-frequency coding tasks more practical on local or self-hosted inference stacks.

🛠️1.3 Target Workflows

This model is designed to be a strong fit forCodex / OpenHands / Claude Code / OpenCode-style agent harnesses, long-running repository edits, automated debugging, multi-round tool calls, low-latency local deployment, and large-context codebase tasks where practical execution quality matters more than verbose visible thinking.


https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF#%F0%9F%92%A1-2-base-model-training-stack–collaboration💡 2. Base Model, Training Stack & Collaboration

🧠2.1 Base Model: Qwopus3.6-35B-A3B-v1 / Qwen3.6-35B-A3B

The coder model builds on the Qwopus3.6-35B-A3B line, itself based onQwen3.6-35B-A3B. The underlying architecture is a hybrid sparse MoE model with 35B total parameters and approximately 3B active parameters per token, making it attractive for local high-frequency coding workloads.

AttributeSpecifications & Details🧩 ArchitectureHybrid sparse MoE, 35B total parameters / ~3B active parameters per token🏢 Base DeveloperAlibaba Cloud / Qwen family, via unsloth/Qwen3.6-35B-A3B🎯 Coder FocusAgentic coding, tool-use stability, code editing, debugging, multi-turn workflow execution⚡ Evaluation EmphasisThinking-off execution, token efficiency, lower latency, stable behavior across long agent loops📄 ContextDesigned for large-context repository work; exact deployment context depends on inference stack and configuration

🧪2.2 Hardware Cooperation & Joint Collaboration

This project is built in close collaboration with engineerKyle Hessling, whose hardware infrastructure, training support, and live agent experiments help validate the model under practical coding workloads.

👉Follow hardware and model training updates on X / Twitter:@KyleHessling1

🦥2.3 Fine-Tuning Framework: Unsloth

The training workflow is accelerated and memory-optimized withUnsloth. Special thanks to the Unsloth team for making efficient large-model fine-tuning more accessible.


https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF#%F0%9F%93%8A-3-thinking-off-agentic-evaluation📊 3. Thinking-Off Agentic Evaluation

📊 Evaluation: Qwopus 3.6 35B Thinking-Off vs Ornith-1.0 35B Thinking-On

Comparison between Qwopus with thinking disabled and Ornith with thinking enabled. All benchmark runs in this section use Q5_K_M / Q5KM quantized models. Higher is better. Benchmarks courtesy of Tom Turney, @no_stp_on_snek on X.

Main FindingIn these Q5_K_M quantized evaluations,Qwopus 3.6 35Bwas tested withthinking disabled. The model also completed a 300-case SWE-bench submitted-patch run with a**62.4%**score. In the behavioral comparison, Qwopus leads in practical execution categories such as legit-request compliance, integrity under pressure, multi-turn orchestration, large code deliverables, and sustained debugging. Ornith remains stronger in selected reasoning-oriented dimensions such as long-context recall, metacognition, engineering competence, and context-poison resistance.

🎞️

Interactive Model Deck by Kyle HesslingKyle created a short Hugging Face Space deck that walks through the model story visually: thinking-off agentic coding, the 35B / 3B MoE setup, MTP-assisted local inference, SWE-bench results, token-efficiency comparisons, Qwopus OFF vs Ornith ON, and the OpenCode RTS demo.

visual explainerthinking-off workflowSWE-bench + RTS demo

Open Kyle’s interactive deck →

Average Score82.1 vs 78.9Qwopus vs Ornith

SWE-bench62.4%300 cases, submitted patches

🧪3.1 SWE-bench Submitted-Patch Run

Result:Qwopus-3.6-35B-A3B-Coder scored62.4%on a300-case SWE-bench runusingthinking offandsubmitted patches. The evaluated model was theQ5_K_M quantizedbuild.

BenchmarkSWE-bench

Run Size300 tasks

ModeThinking off

QuantizationQ5_K_M

EvaluationModel / QuantPatch ModeScoreSWE-bench, 300 casesQwopus-3.6-35B-A3B-Coder Q5_K_MThinking off, submitted patches62.4%

⚖️3.2 Numerical Scorecard

**Note:**Scores are held-out behavioral + long-horizon coding evaluation results on a 0-100 scale. Higher is better. The comparison intentionally contrasts Qwopus in thinking-off mode with Ornith-1.0 in thinking-on mode.

Capability AreaQwopus 3.6 35B thinking offOrnith-1.0 35B thinking onObserved PatternLegit-request compliance10070Qwopus follows allowed user intent much more reliably.Integrity under pressure9386Qwopus is more stable under adversarial or stressful workflow conditions.Multi-turn orchestration8070Qwopus better maintains state across long agent loops.Large code deliverable7565Qwopus shows stronger completion behavior for larger code artifacts.Sustained debugging6050Qwopus holds a practical edge across repeated fix-test cycles.Long-context recall9095Ornith retains a small advantage in recall-heavy thinking-on settings.Metacognition9095Ornith benefits from explicit thinking-on reflection.Engineering competence8194Ornith remains stronger in broad engineering competence.Context-poison resistance7085Ornith is more robust against context poisoning in this test.

Takeaway:Qwopus-3.6-35B-A3B-Coder is positioned as apractical agent execution model. The important result is not merely whether it can think longer, but whether it can keep acting correctly when the workflow demands many fast, concrete decisions. This makes it especially relevant for local coding agents, automated debugging loops, and large codebase tasks where token efficiency directly affects usability.


https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF#%F0%9F%8E%AE-4-live-agent-demo-rts-game-sample🎮 4. Live Agent Demo: RTS Game Sample

🎮 OpenCode / Agent Game-Building Demo

A practical visual test for whether the model can plan, code, iterate, and deliver an interactive project inside an agent workflow.

Kyle Hessling tested the soon-to-release Qwopus-Coder-35B-A3B in an OpenCode workflow by asking it to create a complete RTS-style game sample. This kind of demo is useful because it combines code generation, file orchestration, UI/gameplay logic, iterative correction, and final deliverable quality in one visible task.

Qwopus-3.6-35B-A3B-Coder RTS game demo screenshot

**Why this matters:**a playable game demo is not a formal benchmark, but it is a high-signal smoke test for agentic coding. It exposes whether the model can maintain project structure, generate coherent state logic, and complete a visually inspectable artifact rather than only answering isolated prompts.


https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF#%F0%9F%97%BA%EF%B8%8F-5-training–workflow-design🗺️ 5. Training & Workflow Design

The training and evaluation philosophy for this release centers on agent execution rather than visible chain length. The model should know when to act directly, when to inspect more context, and when to stop and summarize.

[ Qwopus-3.6-35B-A3B-Coder: Agentic Execution Pipeline ]

  Base MoE Foundation
  Qwen3.6-35B-A3B / Qwopus3.6-35B-A3B-v1
          │
          ▼
  Coding + Tool-Use Adaptation
  repository tasks, debugging traces, tool schemas, multi-turn feedback
          │
          ▼
  Thinking-Off Behavior Target
  faster next-step decisions, less overthinking, lower token waste
          │
          ▼
  Agent Harness Workflows
  read files → choose tool → edit code → run tests → inspect errors → iterate → report
          │
          ▼
  Final Objective
  stable long-horizon code execution with practical local latency

This model card intentionally frames thinking-off behavior as a product target. Long thinking can still be useful for difficult reasoning, but the release focuses on whether the model can complete real coding-agent work without paying that cost on every step.


https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF#%E2%9C%85-6-recommended-use-cases–known-limits✅ 6. Recommended Use Cases & Known Limits

✅Good Fits

Codex-style agent workflows, OpenHands/OpenCode coding loops, repository-level debugging, multi-file patch generation, automated test-fix cycles, local tool-calling agents, DevOps scripting, code review assistance, and large-context project navigation.

⚠️Use With Care

As a specialized coder model, it should not be assumed to be optimal for every general-domain task. Tool-call quality depends strongly on prompt format, schema consistency, and the surrounding harness. Long thinking may still help on some high-difficulty reasoning tasks where speed is less important.

Deployment note: For agent use, ensure that tool definitions, system prompts, output parsing, and retry behavior are consistent. Thinking-off models can be fast, but the harness still needs clean schemas, useful error feedback, and strict task boundaries.


https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF#%F0%9F%93%9A-7-resources-acknowledgements–citation📚 7. Resources, Acknowledgements & Citation

📚 Resources & Credits

👉GitHub Repository: Jackrong-llm-finetuning-guide Access the project repository and related fine-tuning guides.

👉Q5_K_M benchmark evaluations SWE-bench submitted-patch run plus behavioral / long-horizon coding evaluation. Benchmarks courtesy of Tom Turney,@no_stp_on_snekon X.

👉Kyle Hessling Interactive Model Deck Visual Hugging Face Space explaining the model story, thinking-off workflow, SWE-bench result, token efficiency, and RTS demo.

👉Kyle Hessling RTS Game Demo Post Reference post for the OpenCode / RTS game-building sample.

👉Unsloth Documentation Training acceleration and memory-efficient fine-tuning resources.

**Acknowledgements:**Special thanks to the Qwen team for the strong Qwen3.6 MoE base model, Unsloth for efficient fine-tuning tooling, Kyle Hessling for hardware collaboration and live agent testing, and open-source contributors building the agentic coding ecosystem.

Citation

@misc{jackrong_qwopus36_35b_a3b_coder,
  title        = {Qwopus-3.6-35B-A3B-Coder},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwopus-3.6-35B-A3B-Coder}}
}

Model tree forJackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUFhttps://huggingface.co/docs/hub/model-cards#specifying-a-base-model

Similar Articles

Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF

Hugging Face Models Trending

A GGUF quantized version of the Qwopus3.6-27B-Coder-MTP model is released on Hugging Face, optimized for local inference and compatible with Transformers, vLLM, SGLang, and Unsloth Studio.

Jackrong/Qwopus3.6-27B-Coder-Compat-MTP-GGUF

Hugging Face Models Trending

Jackrong releases Qwopus3.6-27B-Coder-Compat-MTP-GGUF, a GGUF quantization of the Qwopus3.6-27B-Coder model with an expanded chat template for better interoperability with tool-using runtimes and OpenAI-compatible agent frameworks.

Jackrong/Qwopus3.5-9B-Coder-MTP-GGUF

Hugging Face Models Trending

Jackrong releases Qwopus3.5-9B-Coder-MTP-GGUF, a Qwen-based 9B coding model fine-tuned with Multi-Token Prediction (MTP) architecture, achieving 35.8% throughput improvement and 8.3% accuracy gain over the base model, with perfect scores on coding and math benchmarks.