Tag
A user shares their setup using two modded RTX 2080 Ti GPUs with 22GB VRAM each to run Qwen 3.6 27B at 38 tokens/s with llama.cpp, including tips on power limiting, tensor split mode, and KV cache settings.
A researcher seeks faster, lower-variance benchmarks to tune temperature, top_p, top_k and min_p for Qwen 3.6 35B A3B, estimating months of 3090-time with current setups.