Tag
DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.
DeepSeek, a Chinese AI model built by a quant hedge fund, is reportedly competing with GPT-4 level performance at roughly 5% of the training cost, causing significant market disruption including a $600B drop in NVIDIA's market cap. A free 1 hour 50 minute course has been released teaching users how to leverage DeepSeek V4 locally and via API.
The Hermes Agent model has reached the top global ranking across all AI applications on OpenRouter, powered by contributions from nearly 1,000 developers. The creator thanks the community and invites suggestions for future improvements.
Ring 2.6 1T, a 1-trillion parameter model with open weights, has been listed on Open Router for free use, with expectations of a full public release.
Zyphra releases ZAYA1-74B-Preview, a 74-billion parameter base model trained on AMD hardware, highlighting strong pre-RL reasoning capabilities and agentic performance signals.
Unsloth releases a GGUF quantized version of the Qwen3.6-27B model, featuring improved agentic coding capabilities, tool calling, and support for Unsloth Studio.
Qwen releases the open-weight Qwen3.6-27B model on Hugging Face, featuring improved stability, agentic coding capabilities, and thinking preservation for better developer productivity.
Alibaba releases Qwen3.6-Max-Preview, an updated version of its Qwen3.6 model series with improved performance and capabilities.
A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.
Community discusses the identity of 'Elephant Alpha', a 100B parameter model ranked #1 on OpenRouter with 256K context window, fast inference speed, and strong coding capabilities but poor Chinese support, speculating on which company might be behind it.
MiniMaxAI releases MiniMax-M2.7, an open-weight model featuring self-evolution capabilities, advanced agent team support, and strong performance on software engineering benchmarks (56.22% on SWE-Pro, 66.6% medal rate on MLE Bench Lite), with notable applications in production incident recovery and professional work tasks.
Google DeepMind introduces Gemma 4, its most capable family of open models to date, designed for advanced reasoning and agentic workflows with high intelligence-per-parameter efficiency across multiple sizes.
Mistral AI has released Mistral Medium 3.5, a dense 128B multimodal model featuring a 256k context window, configurable reasoning capabilities, and improved performance in instruction following, reasoning, and coding tasks.
Google DeepMind releases Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 31B parameters with support for text, image, video, and audio inputs. The models feature 256K context windows, MoE and dense architectures, enhanced reasoning capabilities, and are optimized for deployment across devices from mobile to servers.
OpenAI is releasing GPT-5.4 and GPT-5.4 Pro across ChatGPT, the API, and Codex, featuring native computer-use capabilities, 1M token context, improved reasoning and coding, and state-of-the-art performance on professional knowledge work benchmarks. It is described as OpenAI's most capable and token-efficient reasoning model to date.
Google has released Gemini 3 Flash, a fast, cost-effective AI model that combines Pro-grade reasoning with Flash-level speed for tasks like coding, complex analysis, and agentic workflows.
OpenAI introduces GPT-5.2, the most capable model series yet, with significant improvements in knowledge work, code generation, image perception, long-context understanding, and tool-calling. The GPT-5.2 Thinking variant achieves state-of-the-art performance on professional benchmarks, outperforming human experts on 70.9% of GDPval tasks across 44 occupations.
OpenAI releases GPT-5.1, a new model in the GPT-5 series that dynamically adapts thinking time based on task complexity, offering 2-3x faster performance than GPT-5 while maintaining frontier intelligence. The release includes extended prompt caching (24-hour retention), new coding tools (apply_patch and shell), and a 'no reasoning' mode for latency-sensitive applications.
OpenAI releases GPT-5.1 Instant and GPT-5.1 Thinking, upgraded versions of the GPT-5 series with improved conversational abilities, better instruction following, adaptive reasoning, and enhanced tone controls. The models are rolling out to ChatGPT users starting with paid subscribers, with API availability coming later this week.
OpenAI announces GPT-5, their most advanced model yet, unifying capabilities from GPT-4o, o-series reasoning, agents, and advanced math, with immediate rollout to Team users and API access for developers. The release marks a major milestone with 700 million weekly ChatGPT users and 5 million paid business users already leveraging OpenAI's technology.