Why MOE below A10b feels like im gambling
Summary
Developer reports that small-active-parameter MOE models like qwen3.6-35b-A3b exhibit lower coherence and require more guidance than dense qwen3.5-27b, making them hard to slot into agentic workflows.
Similar Articles
Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules
A user benchmarks three Qwen models (Qwen3.5-27B dense, Qwen3.5-122B-A10B MoE, Qwen3.6-35B-A3B MoE) on 4x RTX 3090 GPUs under real agentic workloads, finding that MoE models consistently underperform the dense 27B at following strict global rules despite speed advantages, with the Qwen3.6-35B leading in generation throughput.
Qwen-AgentWorld-35B-A3B: a 3B-active MoE trained to simulate MCP, terminal, SWE, Android, web and OS environments
Qwen released Qwen-AgentWorld-35B-A3B, a 35B-parameter MoE model with 3B active parameters, designed as a language world model to simulate environment responses for agent interactions across seven domains including MCP, terminal, SWE, Android, web, and OS.
Forgive my ignorance but how is a 27B model better than 397B?
User questions how Qwen's 27B dense model can outperform its 397B MoE variant, sparking discussion on MoE efficiency versus dense model quality.
Qwen/Qwen3.6-35B-A3B
Qwen releases Qwen3.6-35B-A3B, an open-weight Mixture-of-Experts model with 35B total parameters and 3B active parameters, featuring significant improvements in agentic coding and reasoning preservation.
@noctus91: I recently switched from Qwen 3.5 9B to LFM2.5-8B-A1B by @liquidai, and it's quickly become my default local model in H…
A user shares their positive experience switching from Qwen 3.5 9B to Liquid AI's new LFM2.5-8B-A1B model, praising its speed and reliability for agentic tasks while noting coding remains a weakness. The model is an 8B MoE with 1.5B active parameters and 128K context, optimized for devices and server-side use.