Tag
GLM-5.2 matches Claude Opus on 45 coding-agent tasks at lower cost, with 43 of 45 tasks having identical outcomes.
The author praises GLM-5.2, an MIT open-weights model, for its exceptional real-world performance in human evaluation benchmarks, claiming it rivals the best closed-source models like those from Claude.
GLM 5.2 is a new open-weights model from Z.ai, compared against Claude Opus in a 3D game coding task. Opus performed faster and cleaner, but GLM 5.2 offers compelling cost and accessibility advantages.
Announcement of Qwable-v1, an open-weights model distilled from Claude Fable-5, along with performance benchmarks on 2dgx sparks hardware achieving 25 tok/sec (single session) and 152 tok/sec (8 sessions).
Released RRT-355M, a softmax-free attention model at GPT-2 Medium scale with 354M parameters trained from scratch on 11.5B tokens, using structural sparsity and tile-skipping kernels for long-context efficiency, achieving comparable performance to GPT-2 Medium on a 22-task benchmark.
Open-weights models have caught up with proprietary ones, with GLM 5.2 achieving near Opus-level scores in browser agent tasks at low cost. Other models like Minimax M3 and Kimi k2.7 also show notable improvements.
This paper introduces Tiered Language Models (TLMs), which allow a single set of open-weight model parameters to support multiple capability levels controlled by secret keys. The method enables selective exposure of private capabilities while preserving public model behavior and resisting extraction.
Mistral AI is adding dedicated Code and Apps sections to its Vibe (Le Chat) web platform, turning it from a conversational interface into a development and app-building environment. A new large, sparse mixture-of-experts model is also confirmed for summer release as open weights.
Cohere releases North Mini Code, a 30B-A3B open-weights model with 4-bit quantization for code generation and agentic coding tasks, supporting 256K context.
GLM 5.2 is an open-weights LLM that is sufficiently capable to allow businesses to manage their IT needs locally on affordable hardware, potentially transforming small/medium enterprise data management.
We're the first to run the full GLM-5.2 (753B FP8) on RTX 4090s by porting sparse-attention kernels to Ada GPUs, enabling frontier open-weights model on commodity hardware.
Google DeepMind released the Gemma 4 series of open-weight models, covering four sizes from 2B to 31B, supporting 128K–256K context, reasoning, and function calling, under Apache 2.0 license, and equipped with ADK framework for autonomous agent capabilities.
Chinese AI lab Z.ai released GLM-5.2, a 753B parameter open weights LLM with a 1M token context window under MIT license, achieving top scores on the Artificial Analysis Intelligence Index and ranking second on the Code Arena WebDev leaderboard.
Apodex 1.0 is a self-evolving AI system post-trained on Qwen3.5, achieving SOTA on BrowseComp, DeepSearchQA, and HLE-text. Its 4B mini model outperforms 30B-class models, with an AgentOS runtime for task orchestration. Open weights available.
Z ai's GLM-5.2 has become the new leading open weights model on the Artificial Analysis Intelligence Index, scoring 51 and outperforming competitors like MiniMax-M3 and DeepSeek V4 Pro. The model features 744B total parameters, 40B active, MIT license, and 1M context window.
Moonshot AI 发布了专注于编程的开放式权重模型 Kimi K2.7 Code,拥有1万亿参数和384个专家,性能在MCP工具调用上超越Opus 4.8,成本仅为十分之一。
Z ai's GLM-5.2 open weights model scores 51 on the Artificial Analysis Intelligence Index, matching GPT-5.4 xhigh and sitting on the Pareto frontier of intelligence vs cost per task.
Step 3.7 Flash, an open-weights model with a 256k context window, is available free in Cline for a month, claiming to outperform Gemini and DeepSeek flash models and approach frontier performance on SWE Bench.
Z.ai releases GLM-5.2, an open-weights AI model with improved coding and agentic performance, demonstrated by beating Kimi K2.7 Code on a physics simulation benchmark across three tasks.
GLM-5.2 has been released with open weights under MIT license, featuring a 1M context window and two reasoning effort modes. Early benchmarks show it performing strongly in coding tasks, making it worth testing beyond benchmark screenshots.