JetBrains/Mellum2-12B-A2.5B-Thinking

Hugging Face Models Trending 模型

摘要

JetBrains releases Mellum2-12B-A2.5B-Thinking, an open-source Mixture-of-Experts reasoning model with 131k context length, trained with RLVR for explicit chain-of-thought reasoning.

Task: text-generation Tags: transformers, safetensors, mellum, text-generation, conversational, en, arxiv:2605.31268, license:apache-2.0, model-index, eval-results, endpoints_compatible, region:us
查看原文
查看缓存全文

缓存时间: 2026/06/02 15:40

JetBrains/Mellum2-12B-A2.5B-Thinking · Hugging Face

Source: https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking Mellum

Use this model when you want explicit chain-of-thought before the final answer — complex debugging, multi-step planning, agentic workflows, and math- or reasoning-heavy tasks. For direct, low-latency answers without reasoning traces, useInstructinstead.

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#mellum2-thinking-highlightsMellum2 Thinking Highlights

Mellum 2 Thinking is a post-trained reasoning-augmented assistant model trained by JetBrains.

The model uses a Mixture-of-Experts architecture with 64 experts and activates 8 experts per token. It uses a combination of sliding-window and full attention layers, with a context length of 131,072 tokens.

It is produced fromMellum2\-12B\-A2\.5B\-Baseby supervised fine-tuning (loss computed only on the final assistant turn) followed by reinforcement learning with verifiable rewards (RLVR) on a harder data mix that includes a long-form math subset. The model emits its reasoning inside<think\>\.\.\.</think\>blocks before the final answer.

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#mellum2-model-familyMellum2 Model Family

This repository contains one checkpoint from the Mellum 2 family.

CheckpointDescriptionBase PretrainBase checkpoint before long-context extensionBaseFinal base modelInstruct SFTSupervised instruction-tuned checkpointThinking SFTSupervised thinking checkpointInstructRL-tuned instruction modelThinkingRL-tuned thinking model

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#model-overviewModel Overview

Mellum2 Thinkinghas the following features:

  • Number of Layers: 28
  • Hidden Size: 2304
  • Intermediate Size: 7168
  • MoE Intermediate Size: 896
  • Number of Experts: 64
  • Number of Activated Experts: 8
  • Number of Attention Heads (GQA): 32 for Q and 4 for KV
  • Context Length: 131,072
  • Sliding Window: 1,024
  • Vocabulary Size: 98,304
  • Precision: bfloat16
  • License: Apache 2.0

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#serving-with-vllmServing with vLLM

# Without tool calling
vllm serve JetBrains/Mellum2-12B-A2.5B-Thinking \
  --max-model-len 131072 \
  --reasoning-parser qwen3

# With tool calling
vllm serve JetBrains/Mellum2-12B-A2.5B-Thinking \
  --max-model-len 131072 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#quickstartQuickstart

Text-Only Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {"role": "user", "content": "Is 1024 a power of 2? Explain your reasoning."},
]

chat_response = client.chat.completions.create(
    model="JetBrains/Mellum2-12B-A2.5B-Thinking",
    messages=messages,
    max_tokens=81920,
    temperature=0.6,
    top_p=0.95,
    extra_body={
        "top_k": 20,
    },
)
print("Chat response:", chat_response)

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#evaluationEvaluation

Post-training evaluation for the thinking/reasoning variants. All values are percentages; higher is better except HarmBench, where lower is better. All values self-reported by JetBrains.

BenchmarkMellum2 Thinking SFTMellum2 ThinkingQwen3.5 (4B)Qwen3.5 (9B)OLMo-3 (7B)Ministral 3 (14B)CodingLiveCodeBench v675.169.959.468.359.842.7Tool UseBFCL v438.845.642.942.7—35.9BFCL v360.569.473.968.5—52.2MathAIME20.058.468.373.461.738.3GSM-Plus62.687.089.390.788.186.5KnowledgeMMLU-Redux84.886.288.391.771.384.4GPQA Diamond39.957.676.881.329.346.0ConversationalIFEval69.176.587.189.884.759.7JetBrains pairwise64.469.540.556.732.263.8MixEval63.466.971.976.067.070.8BS-Bench14.015.063.070.023.09.0SafetyHarmBench (↓)12.220.615.96.648.770.0XSTest90.889.696.897.693.296.8 Notes:

  • AIMEis the mean of AIME 2025 and AIME 2026 (30 questions each).
  • BFCL v4is the macro-average of five subtasks: v1, v2, v3, web search, memory.
  • JetBrains pairwiseis win rate againstQwen2\.5\-7B\-Instructon an internal benchmark.
  • indicates the model lacks native tool calling (OLMo-3-7B-Thinking).

For more details, see theMellum2 Technical Report.

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#licenseLicense

Released under the Apache 2.0 license.

相似文章

JetBrains 的 Mellum 2(阅读时间 49 分钟)

TLDR AI

JetBrains 发布 Mellum 2,这是一个 12B 参数的开源权重混合专家语言模型,专注于软件工程领域,在代码生成、推理和工具使用方面性能具有竞争力,基于 Apache 2.0 许可证发布。

Mellum 2 12B A2.5B

Reddit r/LocalLLaMA

JetBrains发布了Mellum 2 12B A2.5B,这是一个专注于编码的小型MoE模型,其推理性能与Qwen 3.5 9B相当,但在其他任务上较弱。

Mellum2 技术报告

Hugging Face Daily Papers

Mellum 2 是一个由 JetBrains 开发的 12B 参数开源权重的 MoE 语言模型,具有 2.5B 活跃参数,专注于软件工程任务,并针对商用 GPU 上的高效推理进行了优化。