Tag
The next MiniMax AI model is expected to be released in about 10 days, but the tweeter anticipates it may be too large for their hardware.
Modal announces day 0 support for Step 3.7 Flash, a 198B parameter MoE model with 256K context and native image/video understanding.
Liquid AI released LFM2.5-8B-A1B, an edge model with a 128K context window, 38T tokens of pre-training, and large-scale reinforcement learning, capable of tool calling and complex tasks while fitting on an entry-level laptop.
Introduces Aryabhata 2, a reasoning-focused language model for competitive STEM exams, trained via reinforcement learning on PhysicsWallah's question banks, outperforming its base model with fewer tokens.
Anthropic released Claude Opus 4.8, an incremental update over Opus 4.7 with sharper judgment and longer autonomous work capability, though some engineers remain skeptical about its code generation without extensive guidance.
Recommend an open-source free tutorial 'Hands-On Large Models', covering 12 chapters including language model basics, prompt engineering, semantic search, model fine-tuning, multimodal applications, etc. All code can be run directly in Colab.
ORCA is a copilot for end-to-end causal analysis that uses agents to guide users through workflows including causal discovery, effect estimation, and root cause analysis, with structured reports.
This paper proposes Polar, a multimodal memory-augmented framework for personalizing embodied MLLM agents over long-term user interactions, using a knowledge graph and episodic memory to ground user-intended instances from accumulated context.
This paper proposes an LLM-based framework to extract segment disclosures from 10-K filings, improving completeness and comparability through retrieval-augmented systems for longitudinal and cross-firm analysis.
Researchers used an IBM quantum computer to reduce uncertainty in an AI model, achieving the first demonstration of quantum enhancement in a pretrained large language model, allowing it to answer questions correctly where the base model failed.
Elon Musk announced that the Grok foundation model V9-Medium (1.5T parameters) has finished training with strong evaluations, and will be publicly released in 2-3 weeks after fine-tuning and reinforcement learning.
Introduces the Qwen 3.6 (35B/43B) open-source uncensored model, removing official moral and safety restrictions. Requires only 6GB VRAM for local operation. Over a million downloads.
The Marin team pre-registered a predicted loss of 2.252 for a 129B parameter MoE model training run, and the actual result landed at 2.234, demonstrating accurate loss prediction before training.
Presents Macaron-A2UI, a model for generative UI in personal agents that synthesizes dynamic interfaces with lightweight executable actions, moving beyond text-only chat. The paper introduces a large-scale corpus, the A2UI-Bench benchmark, and trains models up to 754B parameters using LoRA fine-tuning and reinforcement learning, achieving strong results.
DeepSeek has made the 75% discount on V4 Pro API pricing permanent, reducing input/output token costs significantly.
NetEase Youdao open-sourced the ZiYue 4 model with 27B parameters, achieving SOTA in math and science; its voice feature supports 3-second cross-language voice cloning across 14 languages with no accent issue, along with open-sourcing the all-scenario intelligent agent 'Longxia' (Lobster).
This paper proposes VBFDD-Agent, a vehicle battery fault detection and diagnosis agent that uses descriptive text modeling of battery signals, large language models, and historical cases to generate interpretable diagnostic results and maintenance recommendations for electric vehicle batteries.
Cohere launches Command A+, its first Mixture-of-Experts model, released under Apache 2.0 with efficient quantization for 1-2 GPU deployment, prioritizing practicality and open access for developers.
Cohere has released Command A+, an open-source model with 25 billion active parameters and 218B total parameters under Apache 2.0, optimized for agentic, multilingual, and reasoning-heavy tasks.
Cohere releases Command A+, an open-source model with 25B active parameters (218B total) optimized for agentic, multilingual, and reasoning-heavy tasks, supporting vision inputs and 128K context under Apache 2.0.