expert-offloading

#expert-offloading

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

Reddit r/LocalLLaMA ↗ · 3d ago

Luce Spark is an open-source tool that enables running 35B MoE models on 16GB GPUs by intelligently caching hot experts on the GPU while keeping the rest in system RAM, using a calibrated placement and bounded async cache to maintain high throughput without the usual offload speed cliff.

0 favorites 0 likes

expert-offloading

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

Submit Feedback