@NielsRogge: Impressive release by StepFun, explore it at https://paperswithcode.co/paper/83892
Summary
StepFun releases Step 3.7 Flash, an open-weight model designed for agentic, coding, search, and multimodal tasks, achieving top scores on several benchmarks.
View Cached Full Text
Cached at: 05/31/26, 12:50 AM
Impressive release by StepFun, explore it at https://t.co/T2BNnPHiRM https://t.co/Nzg9Uaup5K
StepFun (@StepFun_ai): ⚡️ Step 3.7 Flash is here: The new frontier is agent efficiency.
#1 ClawEval-1.1 (67.1), #1 SimpleVQA Search (79.2), #2 SWE-PRO (56.3), 95.3 on V* Python. Open weights under Apache 2.0.
Built for agentic, coding, search, and multimodal workflows — balancing speed, cost, and
Similar Articles
StepFun 3.7 Flash
StepFun released Step 3.7 Flash, a high-efficiency multimodal model optimized for real-world agentic tasks, featuring improved coding benchmarks (SWE-Bench Pro, Terminal-Bench) and compatibility with multiple agent harnesses.
@nathanhabib1011: Step-3.7-Flash from @StepFun_ai is a silent winner. Super impressive results, the best model under 500B params on HF le…
Step-3.7-Flash from StepFun_ai is highlighted as the best model under 500B parameters on Hugging Face leaderboards, with strong multimodal performance.
@AdinaYakup: Step-3.7-Flash New VL model from @StepFun_ai 198B / 11B active - MoE 256K context 3 reasoning level Up to 400 tokens/sec
StepFun releases Step-3.7-Flash, a new large vision-language MoE model with 198B parameters (11B active), 256K context, and up to 400 tokens/sec inference speed.
stepfun-ai/Step-3.7-Flash
Step 3.7 Flash is a 198B-parameter sparse MoE vision-language model with 11B active parameters per token, supporting 256k context and three reasoning levels, designed for high-throughput agentic workflows.
Stepfun 3.7 Flash is very good
Stepfun 3.7 Flash is a compact vision model that achieves aesthetics close to GLM 5.1 and 80% of its 3D world understanding, while using only 25% of the parameters, making it highly RAM-efficient.