Tag
StepFun's Step 3.7 Flash, a 198B sparse MoE model with 11B active parameters, matches 97% of Claude Opus 4.6's coding performance on SWE-Bench Verified at roughly one-ninth the cost, using an Advisor Mode strategy that reserves expensive frontier model calls for critical decision points.
Step 3.7 Flash, an open-weight 198B sparse MoE model, claims 98% agent reliability on tau2-bench across all difficulty levels, with mid raw capability but strong multi-step consistency.
Step 3.7 Flash is a 198B-parameter sparse MoE vision-language model with 11B active parameters per token, supporting 256k context and three reasoning levels, designed for high-throughput agentic workflows.
DECO is a sparse MoE architecture that matches dense Transformer performance with only 20% activated experts and a 3x acceleration kernel, utilizing ReLU-based routing, learnable scaling, and the NormSiLU activation function.
Nucleus-Image is an open-source text-to-image diffusion transformer with 17B parameters across 64 routed experts, activating only ~2B parameters per forward pass. It matches or exceeds leading models like Qwen-Image and Imagen4 while maintaining high efficiency, released with full model weights, training code, and dataset.