Community testers evaluate quantized versions of Qwen3.6, ZAYA1, and other models for SVG chessboard generation accuracy using local inference frameworks like MLX.
According to this. I run several more tests to cover more models and quants. [https://www.reddit.com/r/LocalLLaMA/comments/1t53dhp/quality\_comparison\_between\_qwen\_36\_27b/](https://www.reddit.com/r/LocalLLaMA/comments/1t53dhp/quality_comparison_between_qwen_36_27b/) [Qwen3.6 35B-A3B MLX oQ4. Very very good. \(oMLX - local\)](https://preview.redd.it/zs7hp4o01o0h1.png?width=841&format=png&auto=webp&s=e6d2ae4ce91317fe5ccd8af27bf39352ae6e34a0) Qwen 3.6 35B-A3B MLX oQ4's output is almost perfect. With title, last move label, row and col. But the 2 cursors, one show starting point and the other show end point (red triangles), are a bit confusing at first glance. [ZAYA1 8B - Perfect but without a-h, 1-8 row\/column mark \(Zaya Cloud\)](https://preview.redd.it/zhwqj6nq1o0h1.png?width=397&format=png&auto=webp&s=b4c9840593e3fa63dcce1b3272d0352dc8df515d) ZAYA1 8B is open weight. I used MLX-LM to run it with [this PR](https://github.com/ml-explore/mlx-lm/pull/1261), but no luck. The 8 bits model kept reasoning in a loop without producing any svg. I don't think the local inference engine is ready yet. Since the model needs RSA technique to perform. So I posted the result from zaya cloud's playground - assuming it is FP16 version of it. If somehow local inference engine can produce the same answer, we will have a VERY promising model to run in our tiny computer. The whole process of running 8 bits quant in my computer take less than 12GB of memory. [Qwen3.6 27B MLX oQ6. Very good \(oMLX - local\) no row\/no column marks](https://preview.redd.it/cy0vwne53o0h1.png?width=2003&format=png&auto=webp&s=a449e7f9116212eccc86a324ecdbb737b8cc8559) MLX-oQ 6 bit quant of 27B delivered good and correct answer, but no luck pushing to 3.5 bits. [Qwen3.6 27B MLX oQ3.5e, Not so good. \(oMLX - local\)](https://preview.redd.it/ezy47exe1o0h1.png?width=479&format=png&auto=webp&s=a2428638e9649bed9dedc1b859ba5d5d8329825c) [HY3 Preview 295B A21B - Perfect but no line. no row and no column. \(Open Router\)](https://preview.redd.it/i426jorx1o0h1.png?width=479&format=png&auto=webp&s=35af296ca4d96f89c3348427a8e21444597a5f7b) HY3's 295B is not gonna cut it on my machine. So the result is from the cloud. Now we're entering the weird territory - using those thousand derivatives found floating in the hugging face. I'll be use ones from Jackrong, OrionLLM and DavidAU since all of them published some kind of benchmarks and promise good results. [GRM 2.6 Plus Q4K\_M - a OrionLLM's derivative of Qwen3.6 27B - a correct one and looks really good.](https://preview.redd.it/hbwshurr3o0h1.png?width=1871&format=png&auto=webp&s=2cb97fa0691362f9c08699b95259bd572d86dcf3) [GRM 2.6 Plus Q3K\_M - a OrionLLM's derivative of Qwen3.6 27B - 3 bits was not gonna cut it.](https://preview.redd.it/i5rjfxxn9o0h1.png?width=1638&format=png&auto=webp&s=237a1cd281f90793a849441708091ab37103f5c2) [qwen3.6-27b-neo-code-di-imatrix-max@iq4\_nl - This 4 bits quant is good.](https://preview.redd.it/oxcwkerg8o0h1.png?width=1864&format=png&auto=webp&s=b29268bd21a52587622c91b42699e3000fc6f5b6) [qwen3.6-27b-neo-code-di-imatrix-max@q5k\_s - However its 5 bits counterpart was totally wrong.](https://preview.redd.it/983uadteeo0h1.png?width=1878&format=png&auto=webp&s=8848adc70ebb7900d1ab685fdd808046a427a213) It doesn't mean that higher bit quant will always perform better than the lower bit ones. [Qwopus 35B-A3B-v1 Jackrong's Q4K\_S - the board is wrong and the word game ended came out of nowhere.](https://preview.redd.it/w5vyru6j5o0h1.png?width=1840&format=png&auto=webp&s=fcf7c46f0d54b4057f841cba14a327f8f0fb2c6b) [GRM 2.6 Opus 3 bit Q3K\_M, correct but the visual was degraded. The smallest 27B quant that somehow works.](https://preview.redd.it/4p9wljvn6o0h1.png?width=1107&format=png&auto=webp&s=80e764861a6c0d5af6425fcff36ae50b8050b7b9)
A user shares benchmark results comparing the accuracy of various quantized Gemma and Qwen models on arithmetic, presidential DOB, and attention tests, highlighting trade-offs between model size and quantization level.
A hobbyist compares a heavily quantized GLM 5.2 (Q1_S) against a high-quant Qwen 27B (Q8) on a code generation task, finding that the lower-quant larger model significantly outperforms the higher-quant smaller model in quality and completeness.
A user reports that the QAT quantized variant of Gemma4 26B A4B performs worse on a chessboard SVG test compared to the non-QAT version, with unstable piece drawing despite using suggested settings.
A developer reports achieving high accuracy with fine-tuned Qwen 3.5 4B and 8B models using Unsloth, suggesting a shift towards specialized Expert Language Models (ELMs) for niche tasks.
The author shares a quantization recipe for Qwen3.6 27B that makes the model use significantly fewer thinking tokens while still producing correct answers, leading to faster inference on math benchmarks.