Qwen3.6 35B A3B 无审查异端版原生MTP完整保留发布 KLD 0.0015, 10/100拒绝率 完整19个MTP保留 支持Safetensors、GGUF、NVFP4、NVFP4 GGUF和GPTQ-Int4格式

Reddit r/LocalLLaMA 模型

摘要

社区发布的Qwen3.6 35B A3B无审查变体版本,完整保留19个MTP张量,支持多种格式包括Safetensors、GGUF、NVFP4和GPTQ-Int4。

llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved: [https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved](https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved) llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GGUF: [https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GGUF](https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF) llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-NVFP4-Experts-Only-GGUF: [https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-NVFP4-Experts-Only-GGUF](https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-NVFP4-Experts-Only-GGUF) llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-NVFP4-Experts-Only: [https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-NVFP4-Experts-Only](https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-NVFP4-Experts-Only) llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GPTQ-Int4: [https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GPTQ-Int4](https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved-GPTQ-Int4) 应大家要求发布了,所有版本均已确认完整保留MTP张量数量。附带基准测试。所有模型可在此处查看:[HuggingFace-LLMFan46](https://huggingface.co/llmfan46/models) \*所有版本均已验证完整保留MTP张量。在Safetensors格式中,Qwen3.6-35B-A3B的MTP张量显示为19个条目,因为\`gate\_up\_proj\`存储为融合张量。在GGUF格式中,该融合张量拆分为独立的gate/up专家张量,因此相同的MTP组件显示为20个条目。数量因格式而异,但MTP张量均已完整保留。
查看原文

相似文章

Qwen 3.6 35B GGUF:跨GPU和CPU的NTP vs MTP量化结果

Reddit r/LocalLLaMA

ByteShape发布了Qwen 3.6 35B GGUF的NTP和MTP变体量化,并在多个GPU和CPU上进行了详细基准测试,发现更大的量化模型通常优于较小的模型,MTP以内存为代价提供了GPU速度提升。