@0xLogicrw: MiniMax published a technical blog post detailing the root cause analysis for its M2 series large models' inability to output the person's name "Ma Jiaqi". Starting from a single case study, the investigation ultimately revealed a systematic degradation issue affecting nearly 5% of the entire vocabulary. The root cause was a severe disconnect in data coverage between the two training stages of the large model. In the first stage (pre-training), massive amounts of internet text were used to cre…
Summary
MiniMax published a technical blog post providing an in-depth analysis of the systematic vocabulary degradation issue behind its M2 series large models' inability to output specific personal names. It reveals parameter shifts caused by a disconnect in data coverage between pre-training and post-training stages, and proposes an effective solution involving full-scale synthetic data for remediation.
Similar Articles
MiniMaxAI/MiniMax-M2.7
MiniMaxAI releases MiniMax-M2.7, an open-weight model featuring self-evolution capabilities, advanced agent team support, and strong performance on software engineering benchmarks (56.22% on SWE-Pro, 66.6% medal rate on MLE Bench Lite), with notable applications in production incident recovery and professional work tasks.
@QingQ77: Training a 0.1B end-to-end omnimodal model from scratch. A single set of weights handles text, speech, and image inputs, while outputting text and streaming speech. https://github.com/jingyaogong/minimind-o… MiniMind-O is an omnimodal model with only 0.1B parameters…
MiniMind-O has released an end-to-end omnimodal model with only 0.1B parameters, supporting text, speech, and image inputs as well as streaming speech output. The project opensources the code, weights, training data, and technical report, emphasizing that both training and inference can be performed quickly on standard GPUs.
@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…
A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.
@yidabuilds: https://x.com/yidabuilds/status/2053409619641602286
The author conducted a comparative evaluation of four domestic AI models: DeepSeek V4, Kimi K2.6, GLM-5.1, and MiniMax M2.7. The analysis covers their strengths and weaknesses regarding cost, long-context processing, coding stability, and reasoning performance, offering specific recommendations on how to route tasks involving large document analysis, long-running background jobs, and bulk content generation.
@sanbuphy: K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times…
K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times, boosting throughput from ~15 tokens/s to ~193 tokens/s, ultimately achieving 20% faster inference than LM Studio.