@mishig25: Open source is so back http://hf.co/mistralai/Mistral-Medium-3.5-128B…

X AI KOLs Following Models

Summary

Mistral AI releases Mistral Medium 3.5, an open-source 128B dense model with 256k context, multimodal input, configurable reasoning, and agentic capabilities.

Open source is so back 🔥 https://t.co/UiWsAxajcc https://t.co/KiQcx9MthB
Original Article
View Cached Full Text

Cached at: 06/16/26, 03:37 PM

Open source is so back 🔥 https://t.co/UiWsAxajcc https://t.co/KiQcx9MthB


mistralai/Mistral-Medium-3.5-128B · Hugging Face

Source: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B Mistral Medium 3.5 is our first flagship merged model. It is a dense 128B model with a 256k context window, handling instruction-following, reasoning, and coding in a single set of weights. Mistral Medium 3.5 replaces its predecessor Mistral Medium 3.1 and Magistral in Le Chat. It also replaces Devstral 2 in our coding agent Vibe. Concretely, expect better performance for instruct, reasoning and coding tasks in a new unified model in comparison with our previous released models.

Reasoning effort is configurable per request, so the same model can answer a quick chat reply or work through a complex agentic run. We trained the vision encoder from scratch to handle variable image sizes and aspect ratios.

Find more information on ourblog.

To speed up local inference using vLLM or SGLang, check out our releasedEAGLE model.

The Transformers config originally had an incorrect entry that caused long-context performance degradation. This has been fixed in thiscommit. GGUFs generated using the Transformers config prior to this commit are also affected. Please use the correct config for best performance.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#key-featuresKey Features

Mistral Medium 3.5 includes the following architectural choices:

  • Dense 128B parameters.
  • 256k context length.
  • Multimodal input: Accepts both text and image input, with text output.
  • Instruct and Reasoning functionalitieswith function calls (reasoning effort configurable per request).

Mistral Medium 3.5 offers the following capabilities:

  • Reasoning Mode: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
  • Vision: Analyzes images and provides insights based on visual content, in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic.
  • System Prompt: Strong adherence and support for system prompts.
  • Agentic: Best-in-class agentic capabilities with native function calling and JSON output.
  • Large Context Window: Supports a 256k context window.

We release this model under a**Modified MIT License**: Open-source license for both commercial and non-commercial use with exceptions for companies with large revenue.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#recommended-settingsRecommended Settings

  • Reasoning Effort:- 'none'→ Do not use reasoning - 'high'→ Use reasoning (recommended for complex prompts and agentic usage) Usereasoning\_effort="high"for complex tasks and agentic coding.
  • Temperature: 0.7 forreasoning\_effort="high". Temp between 0.0 and 0.7 forreasoning\_effort="none"depending on the task. Generally, lower means answer that are more to the point and higher allows the model to be more creative. It is a good practice to try different values in order to improve the model performance to meet your demands.
  • Top p: 0.95 forreasoning\_effort="high". You can try different values but staying close should achieve best performance. Leave it toNone(or1\.0) forreasoning\_effort="none".

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#benchmarksBenchmarks

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#agentic-benchmarksAgentic Benchmarks

Mistral Medium 3.5 supersedes all our previous coding models, namely Devstral, across all benchmarks. It scores**91.4%on τ³-Telecom and77.6%**on SWE-Bench Verified. Due to its stronger agentic capabilities, Mistral Medium 3.5 replaces Devstral 2 in our coding agent, Vibe CLI.

Mistral agentic benchmarkMistral agentic benchmark SWE-benchMistral agentic vs competiting models benchmark

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#instruction-following-reasoning-and-coding-benchmarksInstruction Following, Reasoning, and Coding Benchmarks

We compared Mistral Medium 3.5 with competing models on instruction following, reasoning (math), and coding benchmarks. Thanks to its unified capabilities, it achieves strong results across all these tasks and Mistral Medium 3.5 is now powering Le Chat.

instruct reasoning and agentic benchmark

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#usageUsage

You can find Mistral Medium 3.5 support on multiple libraries for inference and fine-tuning.

We herethankevery contributors and maintainers that helped us making it happen.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#mistral-vibeMistral-Vibe

UseMistral Medium 3\.5withMistral Vibe.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#installInstall

Install the latest version:

uv pip install mistral-vibe --upgrade

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#api-usageAPI Usage

Mistral Medium 3.5 can be selected by startingvibe. If it is the first time you launchvibe, it will:

  • Create a default configuration file at ~/.vibe/config.toml.
  • Prompt you to enter your API key if it’s not already configured.
  • Save your API key to ~/.vibe/.env for future use.

Now selectmistral\-medium\-3\.5and start building !

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#local-serverLocal server

If instead of pinging the Mistral API, you want to use a local vLLM server, you can do the following:

    1. Spin up a vllm server as explained inUsage \- vllm
    1. Add the model configuration in~/\.vibe/config\.toml:
display_name = "Mistral Medium 3.5 (local vLLM)"
description = "Mistral Medium 3.5 mode using local vLLM"
safety = "neutral"

active_model = "mistral-medium-3.5" # Make sure this is the only active_model entry
[[providers]]
name = "vllm"
api_base = "http://<your-host-url>:8000/v1"
api_key_env_var = ""
backend = "generic"
api_style = "reasoning"

[[models]]
name = "mistralai/Mistral-Medium-3.5-128B"
provider = "vllm"
alias = "mistral-medium-3.5"
thinking = "high"
temperature = 0.7
auto_compact_threshold = 168000

[tools.bash]
default_timeout = 1200

Notes:

  • Make sure to overwrite<your\-host\-url\>with your server’s url.
  • Other inference backends are also supported. Please look atMistral Vibe repofor more info.

Then restartvibeand “tab-shift” to “mistral-medium-3.5” mode.

Give it a try on some coding agentic tasks and start building some cool stuff !

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#inferenceInference

The model can be deployed with:

For optimal performance, we recommend using the Mistral AI API if local serving is subpar.

Make sure that frameworks relying on the Transformers configuration, including GGUF files, are up to date with the fixes introduced in thiscommit. Otherwise, you will experience subpar performance, especially in long-context sessions.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#fine-tuningFine-Tuning

Fine-tune the model via:

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#vllm-recommendedvLLM (Recommended)

We recommend using Mistral Medium 3.5 with thevLLM libraryfor production-ready inference.

To speed up local inference using vLLM, check out our releasedEAGLE model

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#installationInstallation

Make sure to installvllm nightly:

uv pip install -U vllm \
   --torch-backend=auto \
   --extra-index-url https://wheels.vllm.ai/nightly

Doing so should automatically installmistral\_common \>= 1\.11\.1andtransformers \>= 5\.4\.0.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"
python -c "import transformers; print(transformers.__version__)"

You can also make use of a ready-to-godocker imageor on thedocker hub.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#serve-the-modelServe the Model

We recommend a server/client setup:

vllm serve mistralai/Mistral-Medium-3.5-128B --tensor-parallel-size 8 \
  --tool-call-parser mistral --enable-auto-tool-choice --reasoning-parser mistral --max_num_batched_tokens 16384 --max_num_seqs 128 \
  --gpu_memory_utilization 0.8

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#ping-the-serverPing the Server

Instruction FollowingMistral Medium 3.5 can follow your instructions to the letter.

from datetime import datetime, timedelta

from huggingface_hub import hf_hub_download
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

REASONING_EFFORT = "none" # Toggle reasoning with 'high'.

match REASONING_EFFORT:
    case "none":
        TEMP = 0.1
        TOP_P = None
    case "high":
        TEMP = 0.7
        TOP_P = 0.95
    case _:
        raise ValueError("Only REASONING_EFFORT in ['none', 'high'] are supported.")

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    reasoning_effort=REASONING_EFFORT,
    temperature=TEMP,
    top_p=TOP_P,
)

print("==============================================================")
print(f"Request with {REASONING_EFFORT=}, {TEMP=} and {TOP_P=}.")
print("==============================================================")
print("REASONING")
print("~~~~~~~~~")
print(response.choices[0].message.reasoning)
print("==============================================================")
print("CONTENT")
print("~~~~~~~")
print(response.choices[0].message.content)

Tool CallLet’s solve some equations thanks to our simple Python calculator tool.

import json
from datetime import datetime, timedelta

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

REASONING_EFFORT = "none" # Toggle reasoning with 'high'.

match REASONING_EFFORT:
    case "none":
        TEMP = 0.1
        TOP_P = None
    case "high":
        TEMP = 0.7
        TOP_P = 0.95
    case _:
        raise ValueError("Only REASONING_EFFORT in ['none', 'high'] are supported.")

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"

def my_calculator(expression: str) -> str:
    return str(eval(expression))

tools = [
    {
        "type": "function",
        "function": {
            "name": "my_calculator",
            "description": "A calculator that can evaluate a mathematical expression.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate.",
                    },
                },
                "required": ["expression"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "rewrite",
            "description": "Rewrite a given text for improved clarity",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The input text to rewrite",
                    }
                },
            },
        },
    },
]

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": image_url,
                },
            },
        ],
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    tools=tools,
    tool_choice="auto",
    reasoning_effort=REASONING_EFFORT,
    temperature=TEMP,
    top_p=TOP_P,
)

tool_calls = response.choices[0].message.tool_calls

results = []
for tool_call in tool_calls:
    function_name = tool_call.function.name
    function_args = tool_call.function.arguments
    if function_name == "my_calculator":
        result = my_calculator(**json.loads(function_args))
        results.append(result)

messages.append({"role": "assistant", "tool_calls": tool_calls})
for tool_call, result in zip(tool_calls, results):
    messages.append(
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "name": tool_call.function.name,
            "content": result,
        }
    )

response = client.chat.completions.create(
    model=model,
    messages=messages,
    reasoning_effort=REASONING_EFFORT,
    temperature=TEMP,
    top_p=TOP_P,
)

print("==============================================================")
print(f"Request with {REASONING_EFFORT=}, {TEMP=} and {TOP_P=}.")
print("==============================================================")
print("REASONING")
print("~~~~~~~~~")
print(response.choices[0].message.reasoning)
print("==============================================================")
print("CONTENT")
print("~~~~~~~")
print(response.choices[0].message.content)

Vision ReasoningLet’s see if the Mistral Medium 3.5 knows when to pick a fight !

from datetime import datetime, timedelta

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

REASONING_EFFORT = "high" # Remove reasoning with 'none'.

match REASONING_EFFORT:
    case "none":
        TEMP = 0.1
        TOP_P = None
    case "high":
        TEMP = 0.7
        TOP_P = 0.95
    case _:
        raise ValueError("Only REASONING_EFFORT in ['none', 'high'] are supported.")

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    reasoning_effort=REASONING_EFFORT,
    temperature=TEMP,
    top_p=TOP_P,
)

print("==============================================================")
print(f"Request with {REASONING_EFFORT=}, {TEMP=} and {TOP_P=}.")
print("==============================================================")
print("REASONING")
print("~~~~~~~~~")
print(response.choices[0].message.reasoning)
print("==============================================================")
print("CONTENT")
print("~~~~~~~")
print(response.choices[0].message.content)

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#sglangSGLang

Serve Mistral Medium 3.5 with theSGLang libraryfor production-ready inference.

To speed up local inference using SGLang, check out our releasedEAGLE model.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#installation-1Installation

Day-zero support ships in dedicated docker tags:

docker pull lmsysorg/sglang:dev-mistral-medium-3.5         # H100 / H200 (Hopper, CUDA 12.9)
docker pull lmsysorg/sglang:dev-cu13-mistral-medium-3.5    # B200 / B300 (Blackwell, CUDA 13.0)

Or follow theSGLang installation guide. Requirestransformers \>= 5\.4\.0.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#serve-the-model-1Serve the Model

python -m sglang.launch_server --model-path mistralai/Mistral-Medium-3.5-128B \
  --tp 8 --tool-call-parser mistral --reasoning-parser mistral

For the full deployment guide, benchmarks, and per-request examples (reasoning effort, tool calls, vision, streaming), see theSGLang cookbook entry for Mistral Medium 3.5.

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#transformersTransformers

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#installation-2Installation

First install theTransformers frameworkto use Mistral Medium 3.5:

uv pip install transformers

https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#inference-1Inference

Python Inference Snippet``` import torch from transformers import AutoProcessor, Mistral3ForConditionalGeneration

REASONING_EFFORT = “high” # Remove reasoning with ‘none’.

match REASONING_EFFORT: case “none”: TEMP = 0.1 TOP_P = 1.0 case “high”: TEMP = 0.7 TOP_P = 0.95 case _: raise ValueError(“Only REASONING_EFFORT in [‘none’, ‘high’] are supported.”)

model_id = “mistralai/Mistral-Medium-3.5-128B”

processor = AutoProcessor.from_pretrained(model_id) model = Mistral3ForConditionalGeneration.from_pretrained( model_id, device_map=“auto” )

image_url = “https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438”

messages = [ { “role”: “user”, “content”: [ { “type”: “text”, “text”: “What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.”, }, {“type”: “image_url”, “image_url”: {“url”: image_url}}, ], }, ]

inputs = processor.apply_chat_template(messages, return_tensors=“pt”, tokenize=True, return_dict=True, reasoning_effort=REASONING_EFFORT) inputs = inputs.to(model.device)

output = model.generate( **inputs, max_new_tokens=1024, do_sample=True, temperature=TEMP, top_p=TOP_P, )[0]

Setting skip_special_tokens=False to visualize reasoning trace between [THINK] [/THINK] tags.

decoded_output = processor.decode(output[len(inputs[“input_ids”][0]):], skip_special_tokens=False) print(decoded_output)


## [https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#license](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B#license)License

This model is licensed under a[Modified MIT License](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/blob/main/LICENSE)\.

*You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights\.*

Similar Articles

mistralai/Mistral-Medium-3.5-128B

Hugging Face Models Trending

Mistral AI has released Mistral Medium 3.5, a dense 128B multimodal model featuring a 256k context window, configurable reasoning capabilities, and improved performance in instruction following, reasoning, and coding tasks.

prefeitura-rio/Rio-3.5-Open-397B

Hugging Face Models Trending

Rio 3.5 Open 397B is an open-source, frontier-class AI model post-trained from Qwen 3.5 397B, featuring SwiReasoning for dynamic explicit/latent reasoning switching, achieving state-of-the-art performance across agentic coding, reasoning, and multilingual benchmarks.

Microsoft's new MAI models

Simon Willison's Blog

Microsoft announced two new LLMs: MAI-Thinking-1 (35B reasoning model) and MAI-Code-1-Flash (5B code model), both trained on enterprise-grade, clean data without third-party distillation, with MAI-Thinking-1 claimed to be preferred over Sonnet 4.6 in blind evaluations.