@danveloper: I can't believe this works, but I got DeepSeek-V4-Flash (284B params) running on a Raspberry Pi 5 (8GB edition) at >1to…

X AI KOLs Timeline News

Summary

A developer successfully ran the 284B-parameter DeepSeek-V4-Flash model on a Raspberry Pi 5 at over 1 tok/s, using an untouched GGUF file from antirez after extensive experimentation.

I can't believe this works, but I got DeepSeek-V4-Flash (284B params) running on a Raspberry Pi 5 (8GB edition) at >1tok/s @ ~8W during full-tilt inference! It uses an untouched copy of @antirez's GGUF. Took 160+ experiments over 5 days between GPT-5.5 xhigh and Opus 4.8 max. https://t.co/RAJjNZg44Z
Original Article
View Cached Full Text

Cached at: 06/02/26, 05:35 PM

I can’t believe this works, but I got DeepSeek-V4-Flash (284B params) running on a Raspberry Pi 5 (8GB edition) at >1tok/s @ ~8W during full-tilt inference! It uses an untouched copy of @antirez’s GGUF. Took 160+ experiments over 5 days between GPT-5.5 xhigh and Opus 4.8 max. https://t.co/RAJjNZg44Z

Similar Articles

@Snixtp: DeepSeek V4 Flash on a single RTX Pro 6000?

X AI KOLs Following

DeepSeek V4 Flash GGUF quantizations have been released by antirez, enabling the model to run on single GPUs like the RTX Pro 6000 and Macs with 128GB+ RAM. The quantized files are available on Hugging Face with instructions for the DS4 inference engine.

antirez/deepseek-v4-gguf

Hugging Face Models Trending

Antirez released GGUF quantizations of DeepSeek V4 Flash specifically tailored for the DS4 inference engine, providing optimized configurations for different RAM sizes and enabling local execution of the large MoE model.

Deepseek V4 flash performance on DGX Spark

Reddit r/LocalLLaMA

A Reddit user shares their experience running DeepSeek V4 Flash on a dual-ASUS GX10 DGX Spark setup, detailing performance metrics, configuration, and power consumption, with throughput benchmarks across various context lengths.