Step 3.7 Flash open weights dropped TODAY and the agent reliability numbers are actually interesting

Reddit r/artificial 05/29/26, 02:19 PM Models

open-weights step-3-7-flash agent-reliability sparse-moe toolathlon tau2-bench

Summary

Step 3.7 Flash, an open-weight 198B sparse MoE model, claims 98% agent reliability on tau2-bench across all difficulty levels, with mid raw capability but strong multi-step consistency.

Read this release today. Some crazy numbers. The tau2-bench number is 98% across all difficulty levels. That is the one that got me because usually these releases post a strong easy score and then quietly die at hard difficulty. This one... claims it holds. For multi-step agent work that actually matters more than most benchmarks. A model that drifts on step 4 of a 6 step chain is a debugging nightmare regardless of what its SWE score looks like. Raw capability is mid, Toolathlon at 49.5, GDPval at 45.8. So this is clearly a reliability play, not a frontier capability play. Depending on your use case that is either fine or a dealbreaker. * 198B sparse MoE * 11B activ * 400 TPS * 256K context * Apache 2.0 * runs locally on M4 Max and DGX Spark. Has anyone actually put this through agent evals or am I just reading the release card.

Original Article

Similar Articles

StepFun 3.7 Flash

Reddit r/LocalLLaMA

StepFun released Step 3.7 Flash, a high-efficiency multimodal model optimized for real-world agentic tasks, featuring improved coding benchmarks (SWE-Bench Pro, Terminal-Bench) and compatibility with multiple agent harnesses.

stepfun-ai/Step-3.7-Flash

Hugging Face Models Trending

Step 3.7 Flash is a 198B-parameter sparse MoE vision-language model with 11B active parameters per token, supporting 256k context and three reasoning levels, designed for high-throughput agentic workflows.

StepFun Says Step 3.7 Flash Matches 97% of Claude Opus 4.6's Coding Performance at One-Ninth the Cost

Reddit r/ArtificialInteligence

StepFun's Step 3.7 Flash, a 198B sparse MoE model with 11B active parameters, matches 97% of Claude Opus 4.6's coding performance on SWE-Bench Verified at roughly one-ninth the cost, using an Advisor Mode strategy that reserves expensive frontier model calls for critical decision points.

@StepFun_ai: A thoughtful take on Step 3.7 Flash and the new frontier of agent efficiency, from @FrankYouChill

X AI KOLs Following

StepFun_ai highlights a thoughtful take on the Step 3.7 Flash model and its implications for agent efficiency.

Step 3.7 Flash

Product Hunt

Step 3.7 Flash is a fast agents model designed to see and act in real time.