Tag
Elon Musk announced that the Grok foundation model V9-Medium (1.5T parameters) has finished training with strong evaluations, and will be publicly released in 2-3 weeks after fine-tuning and reinforcement learning.
Introduces the Qwen 3.6 (35B/43B) open-source uncensored model, removing official moral and safety restrictions. Requires only 6GB VRAM for local operation. Over a million downloads.
The Marin team pre-registered a predicted loss of 2.252 for a 129B parameter MoE model training run, and the actual result landed at 2.234, demonstrating accurate loss prediction before training.
Presents Macaron-A2UI, a model for generative UI in personal agents that synthesizes dynamic interfaces with lightweight executable actions, moving beyond text-only chat. The paper introduces a large-scale corpus, the A2UI-Bench benchmark, and trains models up to 754B parameters using LoRA fine-tuning and reinforcement learning, achieving strong results.
DeepSeek has made the 75% discount on V4 Pro API pricing permanent, reducing input/output token costs significantly.
NetEase Youdao open-sourced the ZiYue 4 model with 27B parameters, achieving SOTA in math and science; its voice feature supports 3-second cross-language voice cloning across 14 languages with no accent issue, along with open-sourcing the all-scenario intelligent agent 'Longxia' (Lobster).
This paper proposes VBFDD-Agent, a vehicle battery fault detection and diagnosis agent that uses descriptive text modeling of battery signals, large language models, and historical cases to generate interpretable diagnostic results and maintenance recommendations for electric vehicle batteries.
Cohere launches Command A+, its first Mixture-of-Experts model, released under Apache 2.0 with efficient quantization for 1-2 GPU deployment, prioritizing practicality and open access for developers.
Cohere has released Command A+, an open-source model with 25 billion active parameters and 218B total parameters under Apache 2.0, optimized for agentic, multilingual, and reasoning-heavy tasks.
Cohere releases Command A+, an open-source model with 25B active parameters (218B total) optimized for agentic, multilingual, and reasoning-heavy tasks, supporting vision inputs and 128K context under Apache 2.0.
Meituan has launched its Longma large model, offering 55 million free tokens daily. Register and get free access.
The new Gemini Flash model is expensive to use, suggesting it may be a large but fast model.
A Google DeepMind employee has confirmed the existence of Gemini 3.5, the next iteration of Google's AI model.
A new version of Qwen, Qwen 3.7, has been spotted on the official Qwen website, suggesting an upcoming release.
Presents a SpeechLLM architecture for streaming speech-to-text translation that adaptively decides when to output tokens based on audio, achieving 1-2 second latency with quality close to non-streaming baselines.
This paper shows that continuously consolidating past experiences into textual memory using LLMs degrades memory utility over time, and that preserving raw episodic trajectories outperforms forced consolidation, with implications for robust agentic memory systems.
Xiaomi open-sourced MiMo-V2.5-Pro, a 1.02 trillion parameter MoE model, prompting a cost-benefit analysis of using its API versus self-hosting for autonomous coding tasks.
Xiaomi has open-sourced its MiMo V2.5 Pro model, a 1.02T parameter MoE model designed for autonomous coding tasks. The article details a real-world test showing high efficiency with low API costs due to high cache hit rates.
AntAngelMed is a newly open-sourced 100B-parameter medical language model developed by Zhejiang Health Information Center, Ant Healthcare, and Anzhen'er Medical AI. It achieves top rankings on HealthBench and MedAIBench, utilizing efficient MoE architecture for high-performance inference.
Zhejiang Health and Ant Healthcare released AntAngelMed, an open-source 100B parameter medical LLM that ranks top on MedBench and supports efficient local inference with high privacy.