Tag
Rio 3.5 Open 397B is an open-source, frontier-class AI model post-trained from Qwen 3.5 397B, featuring SwiReasoning for dynamic explicit/latent reasoning switching, achieving state-of-the-art performance across agentic coding, reasoning, and multilingual benchmarks.
User shares an optimized recipe for running Qwen 3.5 122B Int4 on a single DGX Spark with vLLM, achieving over 40 tokens per second. They invite others to try and further optimize it.
Personal benchmark shows Qwen3.5-27B Dense and Gemma4-31B Dense fix 100 % of 37 test failures, outperforming Gemma4-26B MoE even at 8-bit quantization, while using fewer tokens and less wall-clock time.
OpenInfer demonstrates "vertical disaggregation" that boosts Qwen 3.5 27B throughput by ~50% by co-executing quantized layers across a single node’s AMD EPYC CPU and Nvidia L40S GPU with a custom SLA-aware scheduler.