qwen-3.5

#qwen-3.5

prefeitura-rio/Rio-3.5-Open-397B

Hugging Face Models Trending ↗ · 2026-06-11 Cached

Rio 3.5 Open 397B is an open-source, frontier-class AI model post-trained from Qwen 3.5 397B, featuring SwiReasoning for dynamic explicit/latent reasoning switching, achieving state-of-the-art performance across agentic coding, reasoning, and multilingual benchmarks.

0 favorites 0 likes

#qwen-3.5

40+tok/s - optimized recipe for Qwen 3.5 122B Int4 on a single DGX Spark with vLLM

Reddit r/LocalLLaMA ↗ · 2026-05-20

User shares an optimized recipe for running Qwen 3.5 122B Int4 on a single DGX Spark with vLLM, achieving over 40 tokens per second. They invite others to try and further optimize it.

0 favorites 0 likes

#qwen-3.5

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared

Reddit r/LocalLLaMA ↗ · 2026-04-22

Personal benchmark shows Qwen3.5-27B Dense and Gemma4-31B Dense fix 100 % of 37 test failures, outperforming Gemma4-26B MoE even at 8-bit quantization, while using fewer tokens and less wall-clock time.

0 favorites 0 likes

#qwen-3.5

@bastani_behnam: We just published how we unlocked +50% inference capacity on a 27B model — no new GPUs, no new nodes, at a fraction of …

X AI KOLs Following ↗ · 2026-04-21 Cached

OpenInfer demonstrates "vertical disaggregation" that boosts Qwen 3.5 27B throughput by ~50% by co-executing quantized layers across a single node’s AMD EPYC CPU and Nvidia L40S GPU with a custom SLA-aware scheduler.

0 favorites 0 likes

qwen-3.5

prefeitura-rio/Rio-3.5-Open-397B

40+tok/s - optimized recipe for Qwen 3.5 122B Int4 on a single DGX Spark with vLLM

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared

@bastani_behnam: We just published how we unlocked +50% inference capacity on a 27B model — no new GPUs, no new nodes, at a fraction of …

Submit Feedback