@populartourist: Having worked consistently with Qwen3.6 27B NVFP4 on repos - it's clear that this quant is not reliable, at least for c…

X AI KOLs Timeline Models

Summary

The user reports that the Qwen3.6 27B NVFP4 quantization is unreliable for coding, with inconsistent quality despite high throughput, and suggests that Q4_K_M may be more consistent.

Having worked consistently with Qwen3.6 27B NVFP4 on repos - it's clear that this quant is not reliable, at least for coding. Quality is all over the place - results either really good or poor. Throughput is amazing but it loses on debugging churn. Ornestein Q6_K has been the dark horse so far. I'm confident Q4_K_M can be more consistent than this NVFP4 variant.
Original Article
View Cached Full Text

Cached at: 06/15/26, 09:09 PM

Having worked consistently with Qwen3.6 27B NVFP4 on repos - it’s clear that this quant is not reliable, at least for coding.

Quality is all over the place - results either really good or poor.

Throughput is amazing but it loses on debugging churn.

Ornestein Q6_K has been the dark horse so far.

I’m confident Q4_K_M can be more consistent than this NVFP4 variant.

wd 🔺 (@populartourist): Good quality Qwen3.6 27B NVFP4 grafted MTP, with image.

Fits RTX 5090 up to 180/190k context window with FP8 KV for max_num_seqs 1 - can fit more if you dont exhaust context in parallel.

Code work sees 2-8k tok/s throughput with peaks of +11k with MTP.

Initial quality looks

Similar Articles

Qwen3.6-27B Quantization Benchmark

Reddit r/LocalLLaMA

This article benchmarks various Qwen3.6-27B quantizations (Q8 to Q2) using KLD and Same Top P metrics, comparing providers like Unsloth and mradermacher, and offers recommendations for quality-size trade-offs.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

Reddit r/LocalLLaMA

User reports Qwen 3.5 122B significantly outperforms Qwen 3.6 35B on multi-step tasks despite benchmark claims, questioning if quantization or setup issues are to blame.