Sipeed's K3 RISC-V SBCs can run 30B-parameter LLMs 60 TOPS (INT4), Supports BF16/FP16/INT4

Reddit r/LocalLLaMA 05/13/26, 03:54 PM Products

risc-v llm npu edge-ai single-board-computer sipeed

Summary

Sipeed's new K3 RISC-V single-board computers feature 32GB LPDDR5 and a 60 TOPS NPU, enabling local inference of large language models at up to 15 tokens per second.

[https://wccftech.com/sipeed-crams-32gb-lpddr5-60-tops-npu-compact-risc-v-board-hits-15-tokens-s-ai-llms/](https://wccftech.com/sipeed-crams-32gb-lpddr5-60-tops-npu-compact-risc-v-board-hits-15-tokens-s-ai-llms/)

Original Article

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

Reddit r/LocalLLaMA

A community member details a custom PC build using discontinued Intel Optane Persistent Memory to successfully run the 1-trillion parameter Kimi K2.5 model locally at roughly 4 tokens per second via llama.cpp.

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

X AI KOLs Following

Qwen3.6-27B-FP8 model is now running on Intel Arc Pro B70 GPUs at ~50 tok/s with a vLLM bug fix, marking a significant milestone for Intel GPU local AI inference.

@iotcoi: Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny G…

X AI KOLs Timeline

Quantized 27B Qwen3.6 model achieves 200 tok/s peak (136 avg) with 256k context and 10 agents on a single 49W GB10 GPU using Dflash+DDTree optimizations.

Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts

Reddit r/LocalLLaMA

Skymizer announces the HTX301, a PCIe inference card capable of running 700B-parameter LLMs on-premises with high memory and low power consumption.

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

@iotcoi: Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny G…

Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts

Submit Feedback