@ciruai: Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 Strix Halo with 128GB RAM. Getting ~15 TPS over a decently long …

X AI KOLs Timeline News

Summary

Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 with 128GB RAM achieves ~15 TPS for a 284B MoE model (13B active) locally, costing $3,000 versus $25,000+ for a datacenter setup, highlighting the feasibility of running large models on consumer hardware.

Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 Strix Halo with 128GB RAM. Getting ~15 TPS over a decently long context, which is honestly very usable for a model this smart. 284B parameter MoE, A13B active. Before anyone says “that’s slow,” remember: this is running on a $3,000 machine. Getting this kind of model to run fast normally means spending well over $25,000 (if you build it yourself). The accomplishment isn’t beating a datacenter GPU. The accomplishment is running it locally at all.
Original Article
View Cached Full Text

Cached at: 06/18/26, 04:19 PM

Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 Strix Halo with 128GB RAM.

Getting ~15 TPS over a decently long context, which is honestly very usable for a model this smart.

284B parameter MoE, A13B active.

Before anyone says “that’s slow,” remember: this is running on a $3,000 machine. Getting this kind of model to run fast normally means spending well over $25,000 (if you build it yourself).

The accomplishment isn’t beating a datacenter GPU.

The accomplishment is running it locally at all.

Similar Articles

@danveloper: https://x.com/danveloper/status/2064387956387758206

X AI KOLs Timeline

A developer ran DeepSeek-V4-Flash on a Raspberry Pi 5 by streaming model weights from an NVMe SSD, achieving 1.3 tokens/second at 8 watts, demonstrating the feasibility of frontier-adjacent open-weight models on low-cost, offline hardware.

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

X AI KOLs Following

DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.