Stepfun 3.7 Flash is very good

Reddit r/LocalLLaMA Models

Summary

Stepfun 3.7 Flash is a compact vision model that achieves aesthetics close to GLM 5.1 and 80% of its 3D world understanding, while using only 25% of the parameters, making it highly RAM-efficient.

If you can fit Stepfun 3.7 Flash into RAM, try it! It's feeling close to GLM 5.1 quality in terms of aesthetics, and around 80% in terms of 3D world understanding. However since it's only 25% of the params of GLM 5.1, and it has built in vision, it's feeling like nothing else comes close for the RAM just now.
Original Article

Similar Articles

StepFun 3.7 Flash

Reddit r/LocalLLaMA

StepFun released Step 3.7 Flash, a high-efficiency multimodal model optimized for real-world agentic tasks, featuring improved coding benchmarks (SWE-Bench Pro, Terminal-Bench) and compatibility with multiple agent harnesses.

stepfun-ai/Step-3.7-Flash

Hugging Face Models Trending

Step 3.7 Flash is a 198B-parameter sparse MoE vision-language model with 11B active parameters per token, supporting 256k context and three reasoning levels, designed for high-throughput agentic workflows.

stepfun-ai/Step-3.7-Flash-GGUF

Hugging Face Models Trending

StepFun releases GGUF quantizations of their 198B-parameter sparse MoE vision-language model Step-3.7-Flash, enabling local deployment with up to 256K context and selectable reasoning levels.