Mellum 2 12B A2.5B

Reddit r/LocalLLaMA Models

Summary

JetBrains released Mellum 2 12B A2.5B, a coding-focused small MoE model with reasoning performance comparable to Qwen 3.5 9B but weaker in other tasks.

Coding focused small MoE from JetBrains. They claim coding performance around Qwen 3.5 9B for the reasoning model. Worse than Qwen 3.5 4B in in everything else. Models: [https://huggingface.co/collections/JetBrains/mellum-2](https://huggingface.co/collections/JetBrains/mellum-2) Technical report: [https://arxiv.org/abs/2605.31268](https://arxiv.org/abs/2605.31268)
Original Article

Similar Articles

Mellum2 Technical Report

Hugging Face Daily Papers

Mellum 2 is a 12B-parameter open-weight MoE language model by JetBrains with 2.5B active parameters, specialized in software engineering tasks and optimized for efficient inference on commodity GPUs.

JetBrains's Mellum 2 (49 minute read)

TLDR AI

JetBrains releases Mellum 2, a 12B-parameter open-weight Mixture-of-Experts language model specialized in software engineering, with competitive performance in code generation, reasoning, and tool use, available under Apache 2.0.

JetBrains/Mellum2-12B-A2.5B-Thinking

Hugging Face Models Trending

JetBrains releases Mellum2-12B-A2.5B-Thinking, an open-source Mixture-of-Experts reasoning model with 131k context length, trained with RLVR for explicit chain-of-thought reasoning.