qwen-3.6

#qwen-3.6

2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache

Reddit r/LocalLLaMA ↗ · 14h ago

A user shares their setup using two modded RTX 2080 Ti GPUs with 22GB VRAM each to run Qwen 3.6 27B at 38 tokens/s with llama.cpp, including tips on power limiting, tensor split mode, and KV cache settings.

0 favorites 0 likes

#qwen-3.6

Optimizing Qwen 3.6 35B A3B sampling parameters.

Reddit r/LocalLLaMA ↗ · 2026-04-21

A researcher seeks faster, lower-variance benchmarks to tune temperature, top_p, top_k and min_p for Qwen 3.6 35B A3B, estimating months of 3090-time with current setups.

0 favorites 0 likes

qwen-3.6

2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache

Optimizing Qwen 3.6 35B A3B sampling parameters.

Submit Feedback