@0xLogicrw: MiniMax published a technical blog post detailing the root cause analysis for its M2 series large models' inability to output the person's name "Ma Jiaqi". Starting from a single case study, the investigation ultimately revealed a systematic degradation issue affecting nearly 5% of the entire vocabulary. The root cause was a severe disconnect in data coverage between the two training stages of the large model. In the first stage (pre-training), massive amounts of internet text were used to cre…
Summary
MiniMax published a technical blog post providing an in-depth analysis of the systematic vocabulary degradation issue behind its M2 series large models' inability to output specific personal names. It reveals parameter shifts caused by a disconnect in data coverage between pre-training and post-training stages, and proposes an effective solution involving full-scale synthetic data for remediation.
Similar Articles
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence
The MiniMax-M2 series introduces Mixture-of-Experts language models that achieve high performance on agentic tasks with minimal activated parameters (9.8B per token out of 229.9B total), leveraging agent-driven data pipelines, a scalable RL system called Forge, and a checkpoint that takes early steps toward self-evolution.
@jiayuan_jy: A few objective clarifications: 1) This post has nothing to do with MiniMax (I never take sponsored posts). 2) 'Subjective feel' is not the same as actual performance; it's not quantitative data. After more extensive experience, overall coding ability is a qualitative improvement compared to m2.7. A current shortcoming is that 1-shot results compared with...
Jiayuan Zhang shared his initial experience with the M3 model's coding ability, stating that it is a qualitative improvement compared to m2.7, but the 1-shot results are not as comprehensive as Opus 4.6/4.7 and GPT5.5.
MiniMaxAI/MiniMax-M2.7
MiniMaxAI releases MiniMax-M2.7, an open-weight model featuring self-evolution capabilities, advanced agent team support, and strong performance on software engineering benchmarks (56.22% on SWE-Pro, 66.6% medal rate on MLE Bench Lite), with notable applications in production incident recovery and professional work tasks.
@QingQ77: Training a 0.1B end-to-end omnimodal model from scratch. A single set of weights handles text, speech, and image inputs, while outputting text and streaming speech. https://github.com/jingyaogong/minimind-o… MiniMind-O is an omnimodal model with only 0.1B parameters…
MiniMind-O has released an end-to-end omnimodal model with only 0.1B parameters, supporting text, speech, and image inputs as well as streaming speech output. The project opensources the code, weights, training data, and technical report, emphasizing that both training and inference can be performed quickly on standard GPUs.
Testing MiniMax M2.7 via API on three real ML and coding workflows
A developer tests the MiniMax M2.7 model via its API on three practical machine learning and coding workflows, evaluating its performance.