sparse-moe

Tag

Cards List
#sparse-moe

StepFun Says Step 3.7 Flash Matches 97% of Claude Opus 4.6's Coding Performance at One-Ninth the Cost

Reddit r/ArtificialInteligence · 2026-05-30 Cached

StepFun's Step 3.7 Flash, a 198B sparse MoE model with 11B active parameters, matches 97% of Claude Opus 4.6's coding performance on SWE-Bench Verified at roughly one-ninth the cost, using an Advisor Mode strategy that reserves expensive frontier model calls for critical decision points.

0 favorites 0 likes
#sparse-moe

Step 3.7 Flash open weights dropped TODAY and the agent reliability numbers are actually interesting

Reddit r/artificial · 2026-05-29

Step 3.7 Flash, an open-weight 198B sparse MoE model, claims 98% agent reliability on tau2-bench across all difficulty levels, with mid raw capability but strong multi-step consistency.

0 favorites 0 likes
#sparse-moe

stepfun-ai/Step-3.7-Flash

Hugging Face Models Trending · 2026-05-23 Cached

Step 3.7 Flash is a 198B-parameter sparse MoE vision-language model with 11B active parameters per token, supporting 256k context and three reasoning levels, designed for high-throughput agentic workflows.

0 favorites 0 likes
#sparse-moe

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Hugging Face Daily Papers · 2026-05-11 Cached

DECO is a sparse MoE architecture that matches dense Transformer performance with only 20% activated experts and a 3x acceleration kernel, utilizing ReLU-based routing, learnable scaling, and the NormSiLU activation function.

0 favorites 0 likes
#sparse-moe

NucleusAI/Nucleus-Image

Hugging Face Models Trending · 2026-03-17 Cached

Nucleus-Image is an open-source text-to-image diffusion transformer with 17B parameters across 64 routed experts, activating only ~2B parameters per forward pass. It matches or exceeds leading models like Qwen-Image and Imagen4 while maintaining high efficiency, released with full model weights, training code, and dataset.

0 favorites 0 likes
← Back to home

Submit Feedback