Tag
A user discusses a locked Dell quote for 6x RTX PRO 6000 Max-Q GPUs at a discounted price to build an inference cluster for GLM 5.2, asking the community for advice on purchasing strategy before the quote expires.
A quantized 478B-parameter GLM-5.1 model runs on 4×RTX Pro 6000 GPUs via SGLang, delivering 370k-token context at up to 45 tok/s decode and 1340 tok/s prefill, and is demoed driving Figma.
A researcher shares their home compute setup for MLX and AI research, featuring M3 Ultra with 512GB, RTX PRO 6000 with 96GB, and M3 Max with 96GB for model porting and stress testing.