Tag
GLM-5.2 is a new open-source coding model that has caught up to closed-source SOTA models, potentially disrupting revenues of OpenAI and Anthropic.
DeepReinforce open-sources Ornith-1.0, a family of self-improving coding models from 9B to 397B parameters, trained on Gemma 4 and Qwen 3.5 foundations, featuring a novel RL approach that learns to generate its own scaffolds.
Personal benchmark shows Gemma-4E4B tops for routing, Qwen-3.6 27/30B beats Gemma-4 for coding, and MiniMax M2.7 MXFP4 replaces giant Qwen-3.5 quants in an OpenCode llama-swap workflow.
Google has formed a dedicated strike team to improve its coding AI models, ramping up agentic AI efforts amid competitive pressure from Anthropic. This signals an intensifying race in AI coding capabilities between major AI labs.
OpenAI announces it will no longer report SWE-bench Verified scores, citing two critical issues: 59.4% of failed problems have flawed test cases that reject correct solutions, and frontier models have seen benchmark problems during training, making improvements reflect training data exposure rather than genuine capability gains.