@nv_pavlichenko: Today we're releasing Mellum2: our first "serious" LLM. This is a 12B A2.5B MoE LLM pre-trained on ~11T tokens and post…

X AI KOLs Timeline Models

Summary

Releases Mellum2, a 12B A2.5B MoE LLM pretrained on ~11T tokens and post-trained with RLVR. Base, SFT, and RL checkpoints are released with a technical report.

Today we're releasing Mellum2: our first "serious" LLM. This is a 12B A2.5B MoE LLM pre-trained on ~11T tokens and post-trained with RLVR. I'm proud to be leading the team that was working on it for the last 6 months. We release base/SFT/RL checkpoints along with a tech https://t.co/Zj2GusGmYP
Original Article
View Cached Full Text

Cached at: 06/01/26, 03:46 PM

Today we’re releasing Mellum2: our first “serious” LLM.

This is a 12B A2.5B MoE LLM pre-trained on ~11T tokens and post-trained with RLVR. I’m proud to be leading the team that was working on it for the last 6 months.

We release base/SFT/RL checkpoints along with a tech https://t.co/Zj2GusGmYP

Similar Articles

JetBrains's Mellum 2 (49 minute read)

TLDR AI

JetBrains releases Mellum 2, a 12B-parameter open-weight Mixture-of-Experts language model specialized in software engineering, with competitive performance in code generation, reasoning, and tool use, available under Apache 2.0.

Mellum2 Technical Report

Hugging Face Daily Papers

Mellum 2 is a 12B-parameter open-weight MoE language model by JetBrains with 2.5B active parameters, specialized in software engineering tasks and optimized for efficient inference on commodity GPUs.

Mellum 2 12B A2.5B

Reddit r/LocalLLaMA

JetBrains released Mellum 2 12B A2.5B, a coding-focused small MoE model with reasoning performance comparable to Qwen 3.5 9B but weaker in other tasks.