Stepfun 3.7 Flash is very good

Reddit r/LocalLLaMA 05/31/26, 11:03 AM Models

Summary

Stepfun 3.7 Flash is a compact vision model that achieves aesthetics close to GLM 5.1 and 80% of its 3D world understanding, while using only 25% of the parameters, making it highly RAM-efficient.

If you can fit Stepfun 3.7 Flash into RAM, try it! It's feeling close to GLM 5.1 quality in terms of aesthetics, and around 80% in terms of 3D world understanding. However since it's only 25% of the params of GLM 5.1, and it has built in vision, it's feeling like nothing else comes close for the RAM just now.

Original Article

Similar Articles

StepFun 3.7 Flash

Reddit r/LocalLLaMA

StepFun released Step 3.7 Flash, a high-efficiency multimodal model optimized for real-world agentic tasks, featuring improved coding benchmarks (SWE-Bench Pro, Terminal-Bench) and compatibility with multiple agent harnesses.

stepfun-ai/Step-3.7-Flash

Hugging Face Models Trending

Step 3.7 Flash is a 198B-parameter sparse MoE vision-language model with 11B active parameters per token, supporting 256k context and three reasoning levels, designed for high-throughput agentic workflows.

stepfun-ai/Step-3.7-Flash-GGUF

Hugging Face Models Trending

StepFun releases GGUF quantizations of their 198B-parameter sparse MoE vision-language model Step-3.7-Flash, enabling local deployment with up to 256K context and selectable reasoning levels.

StepFun Says Step 3.7 Flash Matches 97% of Claude Opus 4.6's Coding Performance at One-Ninth the Cost

Reddit r/ArtificialInteligence

StepFun's Step 3.7 Flash, a 198B sparse MoE model with 11B active parameters, matches 97% of Claude Opus 4.6's coding performance on SWE-Bench Verified at roughly one-ninth the cost, using an Advisor Mode strategy that reserves expensive frontier model calls for critical decision points.

@NielsRogge: Impressive release by StepFun, explore it at https://paperswithcode.co/paper/83892

X AI KOLs Timeline

StepFun releases Step 3.7 Flash, an open-weight model designed for agentic, coding, search, and multimodal tasks, achieving top scores on several benchmarks.