StepFun 3.7 Flash - Speed Benchmark in M5 Max

Reddit r/LocalLLaMA Models

Summary

Benchmark results for StepFun 3.7 Flash model running on M5 Max via llama.cpp, showing prompt processing and token generation speeds across various context lengths.

Just ran a benchmark with day-0 shipped llama.cpp's branch. M5 Max: 128 GB - Q4\_K\_S / memory peak around \~120+ GB making things sluggish but still usable once cmd+tab landed. Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable. |PP|TG|B|N\_KV|T\_PP s|S\_PP t/s|T\_TG s|S\_TG t/s|T s|S t/s| |:-|:-|:-|:-|:-|:-|:-|:-|:-|:-| |0|128|1|128|0.000|nan|2.038|62.80|2.038|62.80| |2048|128|1|2176|1.938|1056.65|2.115|60.52|4.053|536.88| |8192|128|1|8320|9.153|895.01|2.233|57.32|11.386|730.71| |16384|128|1|16512|22.428|730.52|2.475|51.71|24.903|663.05| |32768|128|1|32896|64.539|507.73|2.818|45.43|67.356|488.39| |65536|128|1|65664|178.227|367.71|3.774|33.92|182.001|360.79| Now Pelican bench - very nice one but with quite a long hand lol https://preview.redd.it/322rt8n4304h1.png?width=780&format=png&auto=webp&s=e34efc12f6d96a22d27038a642c3c198b7b292e2
Original Article

Similar Articles

StepFun 3.7 Flash

Reddit r/LocalLLaMA

StepFun released Step 3.7 Flash, a high-efficiency multimodal model optimized for real-world agentic tasks, featuring improved coding benchmarks (SWE-Bench Pro, Terminal-Bench) and compatibility with multiple agent harnesses.

Stepfun 3.7 Flash is very good

Reddit r/LocalLLaMA

Stepfun 3.7 Flash is a compact vision model that achieves aesthetics close to GLM 5.1 and 80% of its 3D world understanding, while using only 25% of the parameters, making it highly RAM-efficient.