Jackrong/Qwopus3.6-35B-A3B-v1-GGUF

Hugging Face Models Trending 05/06/26, 10:02 AM Models

ai-model fine-tuning mixture-of-experts reasoning gguf qwen

Summary

Jackrong releases Qwopus3.6-35B-A3B-v1, a reasoning-enhanced fine-tune of Alibaba's Qwen3.6 MoE model, optimized for logic and agentic coding with 35B total parameters and 3B active parameters.

Task: image-text-to-text Tags: transformers, gguf, text-generation-inference, unsloth, qwen3_6, moe, reasoning, chain-of-thought, lora, sft, multimodal, vision, tool-use, function-calling, long-context, image-text-to-text, en, zh, es, ru, ja, base_model:unsloth/Qwen3.6-35B-A3B, base_model:adapter:unsloth/Qwen3.6-35B-A3B, license:apache-2.0, endpoints_compatible, region:us, conversational

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/08/26, 09:03 AM

Jackrong/Qwopus3.6-35B-A3B-v1-GGUF · Hugging Face

Source: https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%8C%9F-qwopus36-35b-a3b-v1🌟 Qwopus3.6-35B-A3B-v1

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%92%A1-base-model-overview💡 Base Model Overview

Qwen3.6-35B-A3Bis an advanced hybrid sparse MoE (Mixture-of-Experts) model developed by Alibaba Cloud. It features 35B total parameters with only 3B active parameters per token, ensuring high inference efficiency. Architecturally, it combines Gated DeltaNet linear attention with standard gated attention layers, routing tokens across256 experts. It natively supports a massive262k context windowand is specifically designed for high-performance agentic coding, deep reasoning, and multimodal tasks.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%9A%80-model-refinement–logic-tuning-%EF%BC%88qwopus36-35b-a3b-v1%EF%BC%89🚀 Model Refinement & Logic Tuning （Qwopus3.6-35B-A3B-v1）

🪐Qwopus3.6-35B-A3B-v1is a reasoning-enhanced MoE (Mixture of Experts) model fine-tuned on top ofQwen3.6-35B-A3B.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%9B%A0-training-strategy🛠 Training Strategy

The fine-tuning process for this model is structured intothree distinct stages of distributed SFT (Supervised Fine-Tuning), progressively scaling reasoning complexity and data diversity. This systematic approach ensures the model inherits the base MoE capabilities while sharpening its logic-handling depth.

Looking ahead,**Reinforcement Learning (RL)**training will be introduced in subsequent versions to further optimize the reasoning paths and alignment performance.

This version usesLoRAfine-tuning, but uniquely scales up the trainable parameters, with approximately9% of the model parameters participating in the update. This allows for a deeper adaptation of reasoning capabilities while maintaining the efficiency of parameter-efficient fine-tuning.However, setting trainable parameters to 9% is a risky configuration for this MoE architecture, as it significantly increases the potential for training instability and weight merging conflicts.

Vision & Tool Calling Support: This model supports visual capabilities and tool calling. To enable vision, please place themmproj\.gguffile from theGGUF repositoryinto the same directory as the main\.gguffile.

It is designed for:

🧩 More structured reasoning
🪶 More consistent answer style
🔁 Better cross-source distillation alignment
⚡ A stronger foundation for later larger-scale versions

Community Release Notice: Qwopus3.6-35B-A3B-v1 hasnotundergone complete performance evaluation or safety testing. It is released purely as an experimental community version for research and exploration.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%A7%AA-data-composition–context-length-mix🧪 Data Composition & Context Length Mix

The model was trained on a carefully curated dataset encompassing a wide range of domains, includingmathematics, code, science, multilingual chat, and instruction following.

To balance different capabilities, the training data is divided into four main context-length buckets, incorporating a mix of:

Short format stable samples
Medium complexity reasoning samples
Long context high-quality samples
A small amount of replay samples

Context Length Distribution:

< 4096 tokens: Short-context data focused on establishing stable formatting and basic reasoning.
4096 - 8192 tokens: Medium-context data introducing higher reasoning complexity.
8192 - 16384 tokens: Long-context reasoning data, which also includes10% short sample replayto prevent catastrophic forgetting of basic instruction-following.
16384 - 32K tokens: A small amount of multi-turn conversations to maintain extended interaction capabilities.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%8E%AF-three-stage-curriculum-learning🎯 Three-Stage Curriculum Learning

Qwopus3.6-35B-A3B-v1employs a curriculum learning-style phased reasoning data mix, progressively increasing the difficulty and complexity of the training signals:

**Early Stage (Format Establishment):**Focuses on short-to-medium length, format-stable reasoning samples. The primary goal here is to establish a reliable, structured new reasoning format without overwhelming the model with extreme complexity.
Middle Stage (Complexity Scaling & Multi-Teacher Distillation):Gradually increases the proportion of complex reasoning samples from multiple teacher models. - Distillation data sourced from a27B modelthat closely matches the base model’s stylistic distribution, ensuring the capability gap isn’t too drastic to learn efficiently.
Final Stage (Long-Context Reinforcement & Anti-Drift):Strengthens long-context reasoning capabilities. Crucially, this stage retainsshort sample replayto ensure the model maintains its short-context instruction-following ability and minimizes capability drift.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%9A%80-quick-evaluation-summary-qwopus36-35b-a3b-v1🚀 Quick Evaluation Summary: Qwopus3.6-35B-A3B-v1

This model represents a significant leap ininference efficiencyandone-shot generation qualitycompared to previous dense architectures. By leveraging a Hybrid MoE structure (35B total / 3B active parameters) and Gated DeltaNet linear attention, it balances high throughput with deep reasoning capabilities.

Unmatched Speed:Achieves an average of161.9 tok/son an RTX 5090—a2.6× speedupover the 27B dense predecessor—making it one of the fastest high-parameter models available for single-GPU consumer hardware.
Production-Grade Frontend Design:Evaluated as one of the strongest open models forone-shot HTML/CSS generation. Unlike models that provide surface-level scaffolding, this model delivers complete, functional pages with complex micro-interactions, animated components, and production-ready logic.
**Starvation-Free Reasoning:**Successfully resolves the “thinking starvation” issues seen in earlier versions. It maintains robust performance in long-context JSON extraction and multi-step agentic planning, outputting valid structured data even after extensive internal reasoning traces.
Architectural Efficiency:The integration ofGated DeltaNetallows for a massive262K native context windowwith optimized VRAM usage, keeping memory requirements nearly flat even as sequence lengths increase.

**Verdict:**A premier choice for developers requiring a high-throughput, agentic model that excels at UI/UX generation and complex logical deduction on a single-GPU setup. Here is a summary for model card, based on the🔗Qwopus3.6-35B-A3B-v1 comprehensive evaluation reportby Kyle Hessling.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%E2%9A%A0%EF%B8%8F-known-training–deployment-issues-important⚠️ Known Training & Deployment Issues (IMPORTANT)

Due to the architectural complexities of the Qwen3.6 MoE models, several technical challenges were encountered during training and weight merging.Users should be aware of the following potential instabilities:

MoE Architecture Compatibility Issues - The weight structure of MoE expert layers differs significantly from standard dense models. - There are known, easily triggered incompatibilities betweenPEFT/LoRA,Transformers 5.x’s fused expert pattern, andUnsloth patches. - Even when using the absolute latest environment and dependencies, merging the LoRA weights into the base model after training may fail or encounter severe compatibility bugs. - **Common Error:**You may encounterModuleNotFoundError: Could not import module 'Qwen3\_5MoeForConditionalGeneration'or similar structural mismatch errors during the weight merging phase.

If you are attempting to fine-tune or merge weights for this MoE architecture locally, proceed with caution and be prepared to manually patch model definition files or downgrade specific library versions.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%93%9A-resources–guides📚 Resources & Guides

👉**GitHub Repository: Jackrong-llm-finetuning-guide**Visit the repo to dive into the codebase and reproduce the results locally or on Colab.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%99%8F-acknowledgements🙏 Acknowledgements

Special thanks to:

The Qwen team for the strong Qwen3.6 MoE base model.
Unsloth for efficient fine-tuning frameworks.
Open-source datasets and community contributors.
Kyle Hesslingfor his generous hardware and equipment support. You can follow him for more updates on X / Twitter:@KyleHessling1.

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%93%96-citation📖 Citation

@misc{jackrong_qwopus36_35b_a3b_v1,
  title        = {Qwopus3.6-35B-A3B-v1},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face}
}

Jackrong/Qwopus3.6-35B-A3B-v1-GGUF

Jackrong/Qwopus3.6-35B-A3B-v1-GGUF · Hugging Face

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%8C%9F-qwopus36-35b-a3b-v1🌟 Qwopus3.6-35B-A3B-v1

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%92%A1-base-model-overview💡 Base Model Overview

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%9A%80-model-refinement–logic-tuning-%EF%BC%88qwopus36-35b-a3b-v1%EF%BC%89🚀 Model Refinement & Logic Tuning （Qwopus3.6-35B-A3B-v1）

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%9B%A0-training-strategy🛠 Training Strategy

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%A7%AA-data-composition–context-length-mix🧪 Data Composition & Context Length Mix

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%8E%AF-three-stage-curriculum-learning🎯 Three-Stage Curriculum Learning

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%9A%80-quick-evaluation-summary-qwopus36-35b-a3b-v1🚀 Quick Evaluation Summary: Qwopus3.6-35B-A3B-v1

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%E2%9A%A0%EF%B8%8F-known-training–deployment-issues-important⚠️ Known Training & Deployment Issues (IMPORTANT)

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%93%9A-resources–guides📚 Resources & Guides

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%99%8F-acknowledgements🙏 Acknowledgements

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF#%F0%9F%93%96-citation📖 Citation

Similar Articles

Jackrong/Qwopus-GLM-18B-Merged-GGUF

Qwen/Qwen3.6-35B-A3B-FP8

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF

Qwen/Qwen3.6-27B-FP8

Submit Feedback

Similar Articles

Jackrong/Qwopus-GLM-18B-Merged-GGUF

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF