Tag
User shares an optimized recipe for running Qwen 3.5 122B Int4 on a single DGX Spark with vLLM, achieving over 40 tokens per second. They invite others to try and further optimize it.
Reddit post compares quantized Qwen3.6-27B variants (INT4, NVFP4, BF16-INT4) showing trade-offs between memory size and accuracy for different use-cases.