@no_stp_on_snek: btw this was my loop. as you can see i didn't put much thought into it (typos and all), just a side thing to assess the…

X AI KOLs Following Models

Summary

Release of Qwopus3.6-27B-v2-MTP, a fine-tuned multi-token prediction reasoning model based on Qwen3.6-27B, optimized for coding, DevOps, and math tasks with improved generation speed.

btw this was my loop. as you can see i didn't put much thought into it (typos and all), just a side thing to assess the quality while I worked on other stuff. Pretty good the results given how little i did: /loop. you should validate the game works. design me the first level in mario from snes in /tmp. use javascript/typescript. use longctx mcp server to help you solve bugs. you yourself should be able to self play the game before you call it done. make sure you reserach the game mechanics before starting code. Model link: https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF…
Original Article
View Cached Full Text

Cached at: 06/15/26, 03:05 PM

btw this was my loop. as you can see i didn’t put much thought into it (typos and all), just a side thing to assess the quality while I worked on other stuff. Pretty good the results given how little i did:

/loop. you should validate the game works. design me the first level in mario from snes in /tmp. use javascript/typescript. use longctx mcp server to help you solve bugs. you yourself should be able to self play the game before you call it done. make sure you reserach the game mechanics before starting code.

Model link: https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF…


Jackrong/Qwopus3.6-27B-v2-MTP-GGUF · Hugging Face

Source: https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

🪐 Qwopus3.6-27B-v2-MTP

MTP Release

Multi-Token Prediction reasoning model fine-tuned from Qwen3.6-27B

🧬 Trace Inversion & Negentropy🧠 27B Parameters⚡ Speculative Decoding🛠️ Coding / DevOps / Math

💡What is Qwopus3.6-27B-v2-MTP?

🪐Qwopus3.6-27B-v2-MTPis a speed-oriented reasoning release built on top ofQwen3.6-27B. It keeps the Qwopus line’s focus on reconstructed reasoning traces, coding discipline, DevOps procedures, and mathematical derivations, while addingMulti-Token Predictionfor faster generation. The goal is simple: preserve the depth and structure of a 27B reasoning model while making real interactive use noticeably faster.

⚡ MTP DecodingAuxiliary future-token prediction improves throughput on long reasoning, code, math, and strict-format prompts.

🧩 Structured ReasoningInherits the Qwopus training recipe built around reconstructed step-by-step reasoning trajectories.

🧪 GB10 TestedValidated on a 30-question local benchmark across Logic, Coding, DevOps, Math, and Edge tasks.

🚀 Practical SpeedDesigned for workflows where strong answers matter, but waiting several extra minutes per task does not.

https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF#%F0%9F%92%A1-1-base-model-training-library–cooperation💡 1. Base Model, Training Library & Cooperation

🧠1.1 Base Model Specifications (Qwen3.6-27B)

Qwen3.6-27Bprovides the dense 27B foundation for this release. Qwopus3.6-27B-v2-MTP focuses on preserving the base model’s broad reasoning capability while tuning the output style toward stepwise analysis, tool-aware execution, and practical engineering answers.

AttributeSpecifications & Details🧠 ArchitectureDense Transformer / 27 Billion Parameters🎯 Focus DomainsAgentic Coding, DevOps, structured logic, mathematics, and strict-format output⚡ MTP ObjectiveImprove generation throughput through multi-token speculative prediction while retaining final-answer quality.

🧪1.2 Hardware Cooperation & Joint Collaboration

This project is built in close collaboration with hardware engineerKyle Hessling, whose infrastructure and training support helped make stable 27B-scale experimentation possible.

👉You can follow him for hardware and model training updates on X / Twitter:@KyleHessling1

🦥1.3 Fine-tuning Framework (Unsloth)

The model training workflow is accelerated and memory-optimized withUnsloth. Special thanks to the Unsloth team for making efficient large-model fine-tuning more accessible.

⚙️1.4 Custom MTP Heads Processing & Automation Tooling

This release features a custom splitting and merging methodology designed specifically for Qwen series Multi-Token Prediction (MTP) heads. The automation skill and complete processing pipeline scripts are open-sourced inqwen-mtp-gguf.

🌟If you find this toolkit helpful, please support the project by leaving a star on GitHub!

Community Release Notice: Qwopus3.6-27B-v2-MTP is an experimental community release intended for research, evaluation, and workflow exploration.


https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF#%F0%9F%9A%80-2-mtp-benchmark-qwen36-27b-vs-qwopus36-27b-v2-mtp🚀 2. MTP Benchmark: Qwen3.6-27B vs Qwopus3.6-27B-v2-MTP

Performance Snapshot

Across a 30-question benchmark coveringLogic, Coding, DevOps, Math, and Edge-format tasks, Qwopus3.6-27B-v2-MTP delivers a clear speed advantage over Qwen3.6-27B while producing a more compact overall answer stream. The benchmark is not just a raw throughput test: it includes long coding prompts, operational runbooks, math derivations, and strict constrained-output cases.

Overall Throughput

10.46 T/s

1.66x vs Qwen3.6-27B

Latency Saved

2.34 h

56.5% total time reduction

Token Efficiency

-27.7%

fewer completion tokens overall

Coverage

30 / 30

all benchmark prompts completed

  • Speed: Qwopus3.6-27B-v2-MTP reaches10.46 overall tokens/sec, compared with6.29 tokens/secfor Qwen3.6-27B.
  • Latency: total evaluation time drops from14,901.69sto6,487.81s, saving8,413.88sacross the full run.
  • Output shape: MTP produces67,862 completion tokensversus93,802from Qwen3.6-27B, giving a more compact overall response profile.

https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF#%E2%9A%99%EF%B8%8F-3-test-environment–configuration⚙️ 3. Test Environment & Configuration

  • Compute platform: GB10 dedicated server platform.
  • Evaluation format: same local GGUF server stack for both models.
  • llama-server total context:49152.
  • Temperature / Top-p:1\.0 / 0\.95.
  • Max generated tokens: no explicit cap; generation is bounded by the request budget.
  • Request format:/v1/chat/completionswith user content as text payload.

Benchmark Summary: Qwen3.6-27B vs Qwopus3.6-27B-v2-MTPModelCompletedAvg SpeedOverall T/sCompletion TokensTotal TimeQwen3.6-27B306.326.2993,80214,901.69sQwopus3.6-27B-v2-MTP3010.6610.4667,8626,487.81sDomain-Level PerformanceDomainQuestionsQwen3.6-27B T/sMTP T/sLatency GainQwen3.6-27B TimeMTP TimeToken DeltaLogic56.3310.772.31x38.5 min16.7 min-26.3%Coding76.2610.272.25x1.52 h40.6 min-27.3%DevOps66.2910.392.31x47.4 min20.5 min-28.5%Math86.2911.002.35x1.01 h25.8 min-25.6%Edge46.488.282.27x10.3 min4.5 min-43.6%

https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF#%F0%9F%93%8A-4-full-30-question-comparison📊 4. Full 30-Question Comparison

The table below keeps the benchmark concrete: every row compares the base Qwen3.6-27B run against the Qwopus MTP run on the same prompt. The strongest improvements appear in strict output, probability, DevOps configuration, and medium-length coding tasks, while a few prompts intentionally produce more detailed MTP answers.

30-Question Detailed ComparisonQDomainTaskQwen T/sQwen TimeQwen TokensMTP T/sMTP TimeMTP TokensResult PatternQ1LogicWrong-label coin boxes6.369.4 min3,56911.402.3 min1,5304.16x faster; much more conciseQ2LogicEngineer deployment ordering6.396.1 min2,34910.983.1 min2,0341.98x faster; more conciseQ3LogicSelf-referential truth card6.377.8 min2,99010.834.5 min2,9421.72x faster; similar lengthQ4LogicThree switches and bulbs6.323.6 min1,34210.441.6 min9992.21x faster; more conciseQ5LogicHH vs TH stopping probability6.3011.6 min4,36710.625.2 min3,2662.25x faster; more conciseQ6CodingStreaming top-k frequency6.2813.8 min5,2109.9513.3 min7,9171.04x faster; more expansiveQ7CodingThread-safe TTL cache6.2818.6 min7,00910.645.3 min3,3673.52x faster; much more conciseQ8CodingInterval merge implementation6.2511.2 min4,20310.833.3 min2,1573.36x faster; much more conciseQ9CodingStreaming CSV to JSONL6.2616.5 min6,20010.625.9 min3,7412.81x faster; more conciseQ10CodingC++17 LRU cache6.2713.1 min4,92010.156.0 min3,6442.18x faster; more conciseQ11CodingHighest-paid employee SQL6.296.1 min2,28310.372.4 min1,4752.54x faster; more conciseQ12CodingAtomic Bash backup6.2812.1 min4,54510.334.4 min2,6952.76x faster; much more conciseQ13DevOpsNginx reverse proxy6.2910.4 min3,92410.882.8 min1,8213.70x faster; much more conciseQ14DevOpsLinux service OOM diagnosis6.299.9 min3,7279.964.9 min2,8882.04x faster; more conciseQ15DevOpssystemd worker unit6.298.0 min3,02310.393.3 min2,0372.43x faster; more conciseQ16DevOpsKubernetes rollback runbook6.326.3 min2,38710.362.9 min1,8202.14x faster; more conciseQ17DevOpsDocker CMD vs ENTRYPOINT6.335.4 min2,02810.782.9 min1,8921.82x faster; more conciseQ18DevOpsPrometheus pull monitoring6.327.4 min2,81810.673.7 min2,3422.02x faster; more conciseQ19MathDerivative and critical point6.328.7 min3,27412.063.7 min2,6312.37x faster; more conciseQ20MathLinear system solve6.3210.7 min4,06511.914.2 min2,9762.57x faster; more conciseQ21MathDifferent-color probability6.283.9 min1,47210.1849.6 s4904.74x faster; much more conciseQ22Math2x2 eigen decomposition6.3112.3 min4,66211.284.5 min3,0582.72x faster; more conciseQ23MathInduction proof6.325.8 min2,21111.531.7 min1,1933.34x faster; much more conciseQ24MathBayes disease test6.345.0 min1,87811.383.2 min2,1561.56x faster; more expansiveQ25MathIntegration by parts6.295.5 min2,06411.803.5 min2,4931.55x faster; more expansiveQ26MathCentral Limit Theorem6.248.8 min3,2898.264.1 min2,0462.12x faster; more conciseQ27EdgeStrict JSON output6.323.6 min1,35010.4323.1 s2259.28x faster; much more conciseQ28EdgeExact token pattern6.3752.4 s32812.1529.9 s3451.75x faster; similar lengthQ29EdgeForbidden-word explanation6.715.1 min2,0407.623.5 min1,5731.47x faster; more conciseQ30EdgeIgnore noisy input6.3544.5 s27510.9411.4 s1093.89x faster; much more concise


https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF#%F0%9F%A7%AD-5-domain-reading🧭 5. Domain Reading

Logic

Logic prompts show a strong latency reduction, especially on the box-label puzzle and the HH-vs-TH stopping problem. The MTP model tends to reach the same kind of structured decision path with fewer generated tokens, making it useful when reasoning traces need to stay readable and quick.

Coding

Coding is one of the most practical wins. Thread-safe caching, interval merging, CSV streaming, C++ LRU, SQL, and Bash backup tasks all become substantially faster. Q6 is intentionally more expansive, but the broader coding group remains much faster overall.

DevOps

DevOps prompts benefit from concise operational structure. Nginx, OOM diagnosis, systemd, Kubernetes rollback, Docker command semantics, and Prometheus monitoring all show faster completion while preserving stepwise command-oriented guidance.

Math & Edge Tasks

Math has the highest MTP throughput among the five domains. Edge tasks show the sharpest wall-clock wins, especially strict JSON and noisy-input filtering, where the model can quickly settle into the required output pattern.


https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF#%F0%9F%8E%AF-6-recommended-use-cases🎯 6. Recommended Use Cases

  • Agentic coding and code review assistance.
  • DevOps runbooks, configuration generation, and incident diagnosis.
  • Multi-step math and probability derivations.
  • Structured reasoning with explicit intermediate logic.
  • Fast constrained output generation where latency matters.

Resources, Acknowledgements & Citation

🙏 AcknowledgementsThanks to the Qwen team, Unsloth, open-source contributors, andKyle Hesslingfor close collaboration on hardware and training infrastructure.

📖 Citation

@misc{qwopus36_27b_v2_mtp_2026,
  title        = {Qwopus3.6-27B-v2-MTP},
  author       = {Jack Rong},
  year         = {2026},
  note         = {Qwen3.6-27B based Multi-Token Prediction reasoning model},
  howpublished = {Hugging Face model card}
}

Downloads last month184,446

Model tree forJackrong/Qwopus3.6-27B-v2-MTP-GGUFhttps://huggingface.co/docs/hub/model-cards#specifying-a-base-model

Datasets used to trainJackrong/Qwopus3.6-27B-v2-MTP-GGUF

#### Jackrong/Claude-opus-4.6-TraceInversion-9000x Viewer• Updated27 days ago • 8.67k • 1.9k • 69 #### Jackrong/Claude-opus-4.7-TraceInversion-5000x Viewer• Updated27 days ago • 4.76k • 1.85k • 60

Collection includingJackrong/Qwopus3.6-27B-v2-MTP-GGUF

Tom Turney (@no_stp_on_snek): 1 Loop prompt using claudecode as a harness. qwopus36-27b-coder-mtp-q5_k_m. 5090.

never knew “mario” could kick-roll and teleport!

Similar Articles

Jackrong/Qwopus3.5-9B-Coder-MTP-GGUF

Hugging Face Models Trending

Jackrong releases Qwopus3.5-9B-Coder-MTP-GGUF, a Qwen-based 9B coding model fine-tuned with Multi-Token Prediction (MTP) architecture, achieving 35.8% throughput improvement and 8.3% accuracy gain over the base model, with perfect scores on coding and math benchmarks.

@no_stp_on_snek: https://x.com/no_stp_on_snek/status/2052833502475833384

X AI KOLs Following

An open-source stack using Qwen2.5-32B-Instruct with longctx and vllm-turboquant on a single AMD MI300X achieves competitive results (0.601-0.688) versus SubQ's closed model (0.659) on the MRCR v2 1M-context benchmark, demonstrating open-weights approaches are within striking distance.

bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF

Hugging Face Models Trending

bytkim releases a 4-bit QLoRA SFT Multi-Token Prediction fine-tune of Qwen3.6-27B, packaged as GGUF for local agentic coding. The no-thinking tune is designed for low-latency direct output in agent loops.