Tag
Discusses the cheapest hardware options for running Qwen 3.6 models, comparing RTX 3090 and Tesla V100 GPUs, and provides a detailed cost breakdown for a system at around $2000.
A deal for a used V100 32GB GPU on Aliexpress at approximately $526, including coupon codes.
A blogger describes how they acquired a Tesla V100 SXM2 datacenter GPU for £150 and used a custom adapter to install it in their gaming PC alongside an RTX 4080, achieving 32GB of total VRAM and enabling local inference of 27B parameter models at 32 tokens per second.
A user benchmarks a V100-compatible port of Flash Attention 2, reporting 3x-17x speedups and up to 94% memory reduction over default PyTorch attention.
Achieved 1000 tokens per second generation on Qwen3.6 27B using V100 GPUs with 128 concurrent requests, and 80 t/s for single user.