JetBrains/Mellum2-12B-A2.5B-Thinking

Hugging Face Models Trending 2026/05/26 09:12 模型

mixture-of-experts reasoning open-source chain-of-thought jetbrains huggingface llm

摘要

JetBrains releases Mellum2-12B-A2.5B-Thinking, an open-source Mixture-of-Experts reasoning model with 131k context length, trained with RLVR for explicit chain-of-thought reasoning.

Task: text-generation Tags: transformers, safetensors, mellum, text-generation, conversational, en, arxiv:2605.31268, license:apache-2.0, model-index, eval-results, endpoints_compatible, region:us

查看原文

查看缓存全文

缓存时间: 2026/06/02 15:40

JetBrains/Mellum2-12B-A2.5B-Thinking · Hugging Face

Source: https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking

Use this model when you want explicit chain-of-thought before the final answer — complex debugging, multi-step planning, agentic workflows, and math- or reasoning-heavy tasks. For direct, low-latency answers without reasoning traces, useInstructinstead.

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#mellum2-thinking-highlightsMellum2 Thinking Highlights

Mellum 2 Thinking is a post-trained reasoning-augmented assistant model trained by JetBrains.

The model uses a Mixture-of-Experts architecture with 64 experts and activates 8 experts per token. It uses a combination of sliding-window and full attention layers, with a context length of 131,072 tokens.

It is produced fromMellum2\-12B\-A2\.5B\-Baseby supervised fine-tuning (loss computed only on the final assistant turn) followed by reinforcement learning with verifiable rewards (RLVR) on a harder data mix that includes a long-form math subset. The model emits its reasoning inside<think\>\.\.\.</think\>blocks before the final answer.

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#mellum2-model-familyMellum2 Model Family

This repository contains one checkpoint from the Mellum 2 family.

CheckpointDescriptionBase PretrainBase checkpoint before long-context extensionBaseFinal base modelInstruct SFTSupervised instruction-tuned checkpointThinking SFTSupervised thinking checkpointInstructRL-tuned instruction modelThinkingRL-tuned thinking model

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#model-overviewModel Overview

Mellum2 Thinkinghas the following features:

Number of Layers: 28
Hidden Size: 2304
Intermediate Size: 7168
MoE Intermediate Size: 896
Number of Experts: 64
Number of Activated Experts: 8
Number of Attention Heads (GQA): 32 for Q and 4 for KV
Context Length: 131,072
Sliding Window: 1,024
Vocabulary Size: 98,304
Precision: bfloat16
License: Apache 2.0

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#serving-with-vllmServing with vLLM

# Without tool calling
vllm serve JetBrains/Mellum2-12B-A2.5B-Thinking \
  --max-model-len 131072 \
  --reasoning-parser qwen3

# With tool calling
vllm serve JetBrains/Mellum2-12B-A2.5B-Thinking \
  --max-model-len 131072 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#quickstartQuickstart

Text-Only Input

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {"role": "user", "content": "Is 1024 a power of 2? Explain your reasoning."},
]

chat_response = client.chat.completions.create(
    model="JetBrains/Mellum2-12B-A2.5B-Thinking",
    messages=messages,
    max_tokens=81920,
    temperature=0.6,
    top_p=0.95,
    extra_body={
        "top_k": 20,
    },
)
print("Chat response:", chat_response)

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#evaluationEvaluation

Post-training evaluation for the thinking/reasoning variants. All values are percentages; higher is better except HarmBench, where lower is better. All values self-reported by JetBrains.

BenchmarkMellum2 Thinking SFTMellum2 ThinkingQwen3.5 (4B)Qwen3.5 (9B)OLMo-3 (7B)Ministral 3 (14B)CodingLiveCodeBench v675.169.959.468.359.842.7Tool UseBFCL v438.845.642.942.7—35.9BFCL v360.569.473.968.5—52.2MathAIME20.058.468.373.461.738.3GSM-Plus62.687.089.390.788.186.5KnowledgeMMLU-Redux84.886.288.391.771.384.4GPQA Diamond39.957.676.881.329.346.0ConversationalIFEval69.176.587.189.884.759.7JetBrains pairwise64.469.540.556.732.263.8MixEval63.466.971.976.067.070.8BS-Bench14.015.063.070.023.09.0SafetyHarmBench (↓)12.220.615.96.648.770.0XSTest90.889.696.897.693.296.8 Notes:

AIMEis the mean of AIME 2025 and AIME 2026 (30 questions each).
BFCL v4is the macro-average of five subtasks: v1, v2, v3, web search, memory.
JetBrains pairwiseis win rate againstQwen2\.5\-7B\-Instructon an internal benchmark.
—indicates the model lacks native tool calling (OLMo-3-7B-Thinking).

For more details, see theMellum2 Technical Report.

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#licenseLicense

Released under the Apache 2.0 license.

JetBrains/Mellum2-12B-A2.5B-Thinking

JetBrains/Mellum2-12B-A2.5B-Thinking · Hugging Face

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#mellum2-thinking-highlightsMellum2 Thinking Highlights

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#mellum2-model-familyMellum2 Model Family

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#model-overviewModel Overview

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#serving-with-vllmServing with vLLM

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#quickstartQuickstart

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#evaluationEvaluation

https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking#licenseLicense

相似文章

JetBrains 的 Mellum 2（阅读时间 49 分钟）

JetBrains 推出 Mellum2：一款面向代码生成与推理任务的 12B 参数混合专家模型

Mellum 2 12B A2.5B

Mellum2 开源：一款适用于 AI 工作流的快速模型 | JetBrains AI 博客

Mellum2 技术报告

提交意见反馈