Tag
We're the first to run the full GLM-5.2 (753B FP8) on RTX 4090s by porting sparse-attention kernels to Ada GPUs, enabling frontier open-weights model on commodity hardware.
A user seeks community feedback on purchasing a modded RTX 4090 with 48GB VRAM from GpuWorld.eu, asking for trustworthy sources and alternative sellers like Taobao.
A Reddit user expresses curiosity about modded Chinese GPUs (e.g., 48GB RTX 4090) and seeks information on performance, reliability, and sourcing, proposing to form a research group.
Developer achieved 80+ t/s inference on Qwen3.6-27B with 262K context on a single RTX 4090 by combining MTP (Multi-Token Prediction) with TurboQuant's lossless KV cache compression, sharing their implementation fork and technical details.