internlm/Intern-S2-Preview · Hugging Face

Reddit r/LocalLLaMA Models

Summary

InternLM releases Intern-S2-Preview, a 35B scientific multimodal foundation model that achieves performance comparable to trillion-scale models on professional scientific tasks through task scaling and a full-chain training pipeline.

# Introduction We introduce **Intern-S2-Preview**, an efficient **35B** scientific multimodal foundation model. Beyond conventional parameter and data scaling, Intern-S2-Preview explores **task scaling**: increasing the difficulty, diversity, and coverage of scientific tasks to further unlock model capabilities. By extending professional scientific tasks into a full-chain training pipeline from pre-training to reinforcement learning, Intern-S2-Preview achieves performance comparable to the trillion-scale Intern-S1-Pro on multiple core professional scientific tasks, while using only **35B parameters (continued pretrained from Qwen3.5)**. At the same time, it maintains strong general reasoning, multimodal understanding, and agent capabilities. # [](https://huggingface.co/internlm/Intern-S2-Preview#features)Features * **Scientific task scaling with full-chain training.** Intern-S2-Preview scales hundreds of professional scientific tasks from pre-training to RL, enabling strong performance across multiple specialized domains at only 35B parameters. It further strengthens spatial modeling for small-molecule structures and introduces real-valued prediction modules, making it the first open-source model with both material crystal structure generation capability and strong general capabilities. * **Enhanced agent capabilities for scientific workflows.** Intern-S2-Preview significantly improves agentic abilities over the previous generation, achieving strong results on multiple scientific agent benchmarks. * **Efficient RL reasoning with MTP and CoT compression.** During RL, Intern-S2-Preview adopts shared-weight MTP with KL loss to reduce the mismatch between training and inference behavior, substantially improving MTP accept rate and token generation speed. It also introduces CoT compression techniques to shorten responses while preserving strong reasoning capability, achieving improvements in both performance and efficiency.
Original Article
View Cached Full Text

Cached at: 05/15/26, 11:00 AM

internlm/Intern-S2-Preview · Hugging Face

Source: https://huggingface.co/internlm/Intern-S2-Preview 👋 join us onDiscordandWeChat

https://huggingface.co/internlm/Intern-S2-Preview#introductionIntroduction

We introduceIntern-S2-Preview, an efficient 35B scientific multimodal foundation model. Beyond conventional parameter and data scaling, Intern-S2-Preview explorestask scaling: increasing the difficulty, diversity, and coverage of scientific tasks to further unlock model capabilities.

By extending professional scientific tasks into a full-chain training pipeline from pre-training to reinforcement learning, Intern-S2-Preview achieves performance comparable to the trillion-scale Intern-S1-Pro on multiple core professional scientific tasks, while using only 35B parameters (continued pretrained from Qwen3.5). At the same time, it maintains strong general reasoning, multimodal understanding, and agent capabilities.

https://huggingface.co/internlm/Intern-S2-Preview#featuresFeatures

  • **Scientific task scaling with full-chain training.**Intern-S2-Preview scales hundreds of professional scientific tasks from pre-training to RL, enabling strong performance across multiple specialized domains at only 35B parameters. It further strengthens spatial modeling for small-molecule structures and introduces real-valued prediction modules, making it the first open-source model with both material crystal structure generation capability and strong general capabilities.
  • **Enhanced agent capabilities for scientific workflows.**Intern-S2-Preview significantly improves agentic abilities over the previous generation, achieving strong results on multiple scientific agent benchmarks.
  • **Efficient RL reasoning with MTP and CoT compression.**During RL, Intern-S2-Preview adopts shared-weight MTP with KL loss to reduce the mismatch between training and inference behavior, substantially improving MTP accept rate and token generation speed. It also introduces CoT compression techniques to shorten responses while preserving strong reasoning capability, achieving improvements in both performance and efficiency.

efficient RL reasoning with MTP and CoT compressionFig1: Reasoning Efficiency on Complex Math Benchmarks. Accuracy vs. Average Response Length. Intern-S2-Preview (red star) significantly outperforms trillion-scale Intern-S1-Pro (red circle), and achieving higher accuracy with better token efficiency among medium-size models.### https://huggingface.co/internlm/Intern-S2-Preview#performancePerformance

We evaluate the Intern-S2-Preview on various benchmarks, including general datasets and scientific datasets. We report the performance comparison with the recent VLMs and LLMs below.

performance

Note:Underlinemeans the best performance among open-sourced models,Boldindicates the best performance among all models.

We use theOpenCompassandVLMEvalKitto evaluate all models. For text reasoning benchmarks, Intern-S2-Preview is evaluated with a maximum inference length of 128K tokens, while for multimodal benchmarks, it is evaluated with a maximum inference length of 64K tokens.

https://huggingface.co/internlm/Intern-S2-Preview#quick-startQuick Start

https://huggingface.co/internlm/Intern-S2-Preview#sampling-parametersSampling Parameters

We recommend using the following hyperparameters to ensure better results

top_p = 0.95
top_k = 50
min_p = 0.0
temperature = 0.8

https://huggingface.co/internlm/Intern-S2-Preview#servingServing

Intern-S2-Preview can be deployed using any of the following LLM inference frameworks:

  • LMDeploy
  • vLLM
  • SGLang

Detailed deployment examples for these frameworks are available in theModel Deployment Guide.

https://huggingface.co/internlm/Intern-S2-Preview#advanced-usageAdvanced Usage

https://huggingface.co/internlm/Intern-S2-Preview#tool-callingTool Calling

Tool Calling lets the model extend its capabilities by invoking external tools and APIs. The example below shows how to use it to fetch the latest weather forecast via an OpenAI-compatible API (based on lmdeploy api server).

from openai import OpenAI
import json

def get_current_temperature(location: str, unit: str = "celsius"):
    """Get current temperature at a location.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, and the unit in a dict
    """
    return {
        "temperature": 26.1,
        "location": location,
        "unit": unit,
    }

def get_temperature_date(location: str, date: str, unit: str = "celsius"):
    """Get temperature at a location and date.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        date: The date to get the temperature for, in the format "Year-Month-Day".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, the date and the unit in a dict
    """
    return {
        "temperature": 25.9,
        "location": location,
        "date": date,
        "unit": unit,
    }

def get_function_by_name(name):
    if name == "get_current_temperature":
        return get_current_temperature
    if name == "get_temperature_date":
        return get_temperature_date

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_current_temperature',
        'description': 'Get current temperature at a location.',
        'parameters': {
            'type': 'object',
            'properties': {
                'location': {
                    'type': 'string',
                    'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
                },
                'unit': {
                    'type': 'string',
                    'enum': [
                        'celsius',
                        'fahrenheit'
                    ],
                    'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
                }
            },
            'required': [
                'location'
            ]
        }
    }
}, {
    'type': 'function',
    'function': {
        'name': 'get_temperature_date',
        'description': 'Get temperature at a location and date.',
        'parameters': {
            'type': 'object',
            'properties': {
                'location': {
                    'type': 'string',
                    'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
                },
                'date': {
                    'type': 'string',
                    'description': 'The date to get the temperature for, in the format \'Year-Month-Day\'.'
                },
                'unit': {
                    'type': 'string',
                    'enum': [
                        'celsius',
                        'fahrenheit'
                    ],
                    'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
                }
            },
            'required': [
                'location',
                'date'
            ]
        }
    }
}]

messages = [
    {'role': 'user', 'content': 'Today is 2024-11-14, What\'s the temperature in San Francisco now? How about tomorrow?'}
]

openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    max_tokens=32768,
    temperature=0.8,
    top_p=0.95,
    extra_body=dict(spaces_between_special_tokens=False),
    tools=tools)
print(response.choices[0].message)
messages.append(response.choices[0].message)

for tool_call in response.choices[0].message.tool_calls:
    tool_call_args = json.loads(tool_call.function.arguments)
    tool_call_result = get_function_by_name(tool_call.function.name)(**tool_call_args)
    tool_call_result = json.dumps(tool_call_result, ensure_ascii=False)
    messages.append({
        'role': 'tool',
        'name': tool_call.function.name,
        'content': tool_call_result,
        'tool_call_id': tool_call.id
    })

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.95,
    extra_body=dict(spaces_between_special_tokens=False),
    tools=tools)
print(response.choices[0].message)

https://huggingface.co/internlm/Intern-S2-Preview#switching-between-thinking-and-non-thinking-modesSwitching Between Thinking and Non-Thinking Modes

Intern-S2-Preview enables thinking mode by default, enhancing the model’s reasoning capabilities to generate higher-quality responses. This feature can be disabled by settingenable\_thinking=Falseintokenizer\.apply\_chat\_template

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # think mode indicator
)

When serving Intern-S2-Preview models, you can dynamically control the thinking mode by adjusting theenable\_thinkingparameter in your requests.

from openai import OpenAI
import json

messages = [
{
    'role': 'user',
    'content': 'who are you'
}, {
    'role': 'assistant',
    'content': 'I am an AI'
}, {
    'role': 'user',
    'content': 'AGI is?'
}]

openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.95,
    max_tokens=2048,
    extra_body={
        "chat_template_kwargs": {"enable_thinking": False}
    }
)
print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))

Note: We do not recommend disabling thinking mode for agentic tasks.

https://huggingface.co/internlm/Intern-S2-Preview#agent-integrationAgent Integration

Intern-S2-Preview can be plugged into agent frameworks in two ways: connecting to aself-hosted deployment, or calling theofficial InternLM API. Below we cover both, with examples for agent frameworks (OpenClaw, Hermes, etc.) and for Claude Code.

https://huggingface.co/internlm/Intern-S2-Preview#1-self-hosted-deployment-lmdeploy-as-an-example1. Self-hosted Deployment (LMDeploy as an example)

First, serve the model with LMDeploy following theModel Deployment Guide. The example below assumes the server is running athttp://0\.0\.0\.0:23333.

https://huggingface.co/internlm/Intern-S2-Preview#connecting-agent-frameworksConnecting Agent Frameworks

Most agent frameworks (OpenClaw, Hermes, etc.) accept an OpenAI-compatible endpoint. Point them at the LMDeploy server base urlhttp://0\.0\.0\.0:23333/v1.

You can check the connection with the following command:

curl http://0.0.0.0:23333/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "internlm/Intern-S2-Preview",
    "messages": [
      {"role": "user", "content": "Hello"}
    ],
    "temperature": 0.8,
    "top_p": 0.95
  }'

Or you can configure your agent framework with the environment variables

export OPENAI_API_KEY=EMPTY
export OPENAI_BASE_URL=http://0.0.0.0:23333/v1
export OPENAI_MODEL=internlm/Intern-S2-Preview

Remember to launch LMDeploy with\-\-tool\-call\-parser interns2\-previewso tool calls are parsed correctly.

https://huggingface.co/internlm/Intern-S2-Preview#connecting-claude-codeConnecting Claude Code

LMDeploy exposes an Anthropic-compatible/v1/messagesendpoint that Claude Code can talk to directly. Add the following to~/\.claude/settings\.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://127.0.0.1:23333",
    "ANTHROPIC_AUTH_TOKEN": "dummy",
    "ANTHROPIC_MODEL": "internlm/Intern-S2-Preview",
    "ANTHROPIC_CUSTOM_MODEL_OPTION": "internlm/Intern-S2-Preview"
  }
}

For a full walkthrough (curl verification, model routing, troubleshooting), seeLMDeploy × Claude Code.

https://huggingface.co/internlm/Intern-S2-Preview#2-official-intern-api2. Official Intern API

If you do not want to self-host, you can use the official Intern API. Register atinternlm.intern-ai.org.cnand create an API token (sk\-xxxxxxxx).

https://huggingface.co/internlm/Intern-S2-Preview#connecting-agent-frameworks-1Connecting Agent Frameworks

The service is OpenAI-compatible, so any agent framework works. You can set the base url tohttps://chat\.intern\-ai\.org\.cn/api/v1and the model name tointern\-s2\-previewin the cli or config file.

You can check the connection with the following command:

curl https://chat.intern-ai.org.cn/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxxxxxxx" \
  -d '{
    "model": "intern-s2-preview",
    "messages": [
      {"role": "user", "content": "Hello"}
    ],
    "temperature": 0.8,
    "top_p": 0.95
  }'

Refer to theIntern API documentationfor the current endpoint, available model names, rate limits, and advanced parameters.

https://huggingface.co/internlm/Intern-S2-Preview#connecting-claude-code-1Connecting Claude Code

Claude Code can route to the official Intern API by pointingANTHROPIC\_BASE\_URLat the Intern Anthropic-compatible gateway:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://chat.staging.intern-ai.org.cn",
    "ANTHROPIC_AUTH_TOKEN": "your-api-token",
    "ANTHROPIC_MODEL": "intern-s2-preview",
    "ANTHROPIC_SMALL_FAST_MODEL": "intern-s2-preview"
  }
}

Then start claude code with the following command:

claude --model intern-s2-preview

For step-by-step setup, seeIntern API × Claude Code Integration.

Similar Articles

ml-intern

Product Hunt

Hugging Face launches ML-Intern, an AI agent that automates post-training tasks for machine-learning workflows.