moe-model

#moe-model

@sudoingX: anyone running a 16gb card, stop scrolling. @pupposandro and @davideciffa got qwen 35b-a3b down to 13.3gb, measured on …

X AI KOLs Timeline ↗ · yesterday Cached

A technique called luce spark allows Qwen 35B-a3B MoE model to run on a 16GB GPU (like RTX 3090) by learning which experts are frequently used and streaming the rest from RAM, achieving ~100 tok/s without VRAM bottleneck.

0 favorites 0 likes

#moe-model

Llama.cpp B9406 MTP mmproj fix

Reddit r/LocalLLaMA ↗ · 2026-05-29

Llama.cpp release B9406 fixes a crash (GGML_ASSERT) when using MTP with MoE vision models like Qwen3.6-35B-A3B.

0 favorites 0 likes

#moe-model

Re. what ever happened to Cohere’s Command-A series of models?

Reddit r/LocalLLaMA ↗ · 2026-05-20

Cohere launches Command A+, its first Mixture-of-Experts model, released under Apache 2.0 with efficient quantization for 1-2 GPU deployment, prioritizing practicality and open access for developers.

0 favorites 0 likes

moe-model

@sudoingX: anyone running a 16gb card, stop scrolling. @pupposandro and @davideciffa got qwen 35b-a3b down to 13.3gb, measured on …

Llama.cpp B9406 MTP mmproj fix

Re. what ever happened to Cohere’s Command-A series of models?

Submit Feedback