Tag
The 5.6 Sol model is coming to Cerebras hardware in July, offering inference at 750 tokens per second.
StepFun releases Step-3.7-Flash, a new large vision-language MoE model with 198B parameters (11B active), 256K context, and up to 400 tokens/sec inference speed.