Tag
Provides analysis and comparison of Claude Sonnet 5's performance across benchmarks.
Analyzes the gap between open weights and closed source LLMs using the Artificial Analysis Intelligence Index and other benchmarks, finding that the gap is shrinking on some metrics but stable on others.
Z ai's GLM-5.2 has become the new leading open weights model on the Artificial Analysis Intelligence Index, scoring 51 and outperforming competitors like MiniMax-M3 and DeepSeek V4 Pro. The model features 744B total parameters, 40B active, MIT license, and 1M context window.
GLM-5.2 (max) is currently ranked as the third best AI model overall according to Artificial Analysis' Intelligence Index, with detailed analysis of intelligence, openness, cost, and token usage.
Claude Fable 5 achieved a score of 65 on the Artificial Analysis intelligence index.
Qwen3.7 Max ranks 5th on Artificial Analysis benchmarks, matching GPT-5.4 and outperforming Gemini 3.5 Flash, while Qwen3.6 27B trails significantly.
Cerebras announces it is running Kimi K2.6, a trillion parameter model, at approximately 1,000 tokens per second in enterprise trials, claiming the fastest frontier model performance ever measured by Artificial Analysis.
Artificial Analysis introduces the Coding Agent Index, a new benchmark suite combining SWE-Bench-Pro-Hard-AA, Terminal-Bench v2, and SWE-Atlas-QnA to evaluate the performance of AI coding agents across diverse tasks.
Moonshot AI's Kimi K2.6 has debuted at fourth place on the Artificial Analysis Intelligence Index, marking a strong benchmark showing for the latest version of the model.