Tag
NVIDIA released Nemotron Ultra, a hybrid MoE model with 55B/550B parameters and a 1M context window, supporting MTP speculative decoding and available day-0 in transformers.
MiniMax released M3, an open-weights model combining frontier coding, 1M context, and native multimodality, offering comparable performance to Opus at a fraction of the cost.
StepFun releases Step-3.7-Flash, a new large vision-language MoE model with 198B parameters (11B active), 256K context, and up to 400 tokens/sec inference speed.
OpenAI released GPT Realtime-2 and two accompanying models during Build Hour, enhancing the intelligence and naturalness of voice interaction. It supports 128k context, parallel tool calls, and dynamic voice cloning, demonstrating production-grade applications such as voice-driven shopping assistants and analytics dashboards.