Are older Titan cards still viable?

Reddit r/LocalLLaMA News

Summary

A user explores the viability of older Nvidia Titan cards for running Gemma/Qwen MOE coding models, comparing memory bandwidth and cost against newer consumer cards.

Looking at older Nvidia cards under £200 for Gemma/Qwen MOE coding. Is there any reason to avoid older Titan 12GB cards other than being power hungry? They have more memory bandwidth than the newer consumer cards Titan X 12GB 480GB/s Titan XP 12GB 547GB/s Titan V 12GB 652GB/s RTX 2060 12GB 336GB/s RTX 2080 Ti 11GB 616GB/s RTX 3060 12GB 360GB/s
Original Article

Similar Articles

24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

Reddit r/LocalLLaMA

A developer demonstrates running MoE models like Qwen 3.6 35B-A3B and Gemma 4 26B-A4B at 24+ tok/s on an old GTX 1080 (8GB VRAM) with 128k context using llama.cpp with MoE offloading and TurboQuant KV cache quantization, revealing optimization tricks for Gemma's MTP speculative decoding.

Qwen3.6-35B vs Gemma4-26B on 7900 XTX

Reddit r/LocalLLaMA

A detailed benchmark comparing Qwen3.6-35B and Gemma4-26B on Radeon 7900 XTX shows Gemma is ~20% faster end-to-end despite slower token generation, because Qwen generates ~2x more tokens due to internal reasoning. The article recommends using Qwen for throughput-bound batch work and Gemma for latency-sensitive single requests.