unsloth/Qwen-AgentWorld-35B-A3B-GGUF

Hugging Face Models Trending 06/24/26, 11:45 PM Models

language-world-model agentic-simulation qwen gguf quantized huggingface open-source

Summary

Unsloth released a GGUF quantization of Qwen-AgentWorld-35B-A3B, a native language world model that simulates agentic environments across seven domains (MCP, Search, Terminal, SWE, Android, Web, OS) using long chain-of-thought reasoning and trained via CPT, SFT, and RL.

Task: text-generation Tags: transformers, gguf, qwen, unsloth, world-model, agent, environment-simulation, text-generation, dataset:Qwen/AgentWorldBench, arxiv:2606.24597, base_model:Qwen/Qwen-AgentWorld-35B-A3B, base_model:quantized:Qwen/Qwen-AgentWorld-35B-A3B, license:apache-2.0, endpoints_compatible, region:us, imatrix, conversational

Original Article

View Cached Full Text

Cached at: 06/28/26, 11:21 AM

unsloth/Qwen-AgentWorld-35B-A3B-GGUF · Hugging Face

Source: https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF

This repository contains the model weights and configuration files forQwen-AgentWorld-35B-A3B, a native language world model trained for agentic environment simulation. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, etc.

Qwen-AgentWorldis the first language world model to cover seven agent interaction domains within a single model. It simulates agentic environments via long chain-of-thought reasoning, predicting the next environment state given an agent’s action and interaction history. Trained through a three-stage pipeline — CPT injects environment knowledge, SFT activates next-state-prediction reasoning, RL sharpens simulation fidelity — Qwen-AgentWorld is anative world model: environment modeling is the training objective from the CPT stage onward, not a post-hoc add-on.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#highlightsHighlights

**Seven Unified Domains.**A single model covers MCP (tool calling), Search, Terminal, SWE (software engineering), Android, Web, and OS — spanning both text and GUI interaction environments.
**Native World Model.**Environment modeling from CPT onward, not post-hoc adaptation on a general-purpose LLM.
**Generalizable, Scalable & Controllable Simulator.**Zero-shot generalization to OOD environments (e.g., OpenClaw); controllable perturbations and fictional-world construction surpass real-environment training.
**Agent Foundation Model.**LWM RL warm-up on single-turn, non-agentic trajectories transfers to multi-turn, tool-calling agentic tasks across 7 benchmarks, including 3 entirely out-of-domain.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#model-overviewModel Overview

Type: Causal Language Model (Language World Model)
Base Model:Qwen3.5-35B-A3B-Base
Training Stage: Continual Pre-Training (CPT) → Supervised Fine-Tuning (SFT) → Reinforcement Learning (RL, GSPO)
Number of Parameters: 35B in total and 3B activated
Hidden Dimension: 2048
Token Embedding: 248320 (Padded)
Number of Layers: 40
Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
Gated DeltaNet:- Number of Linear Attention Heads: 32 for V and 16 for QK - Head Dimension: 128
Gated Attention:- Number of Attention Heads: 16 for Q and 2 for KV - Head Dimension: 256 - Rotary Position Embedding Dimension: 64
Mixture Of Experts- Number of Experts: 256 - Number of Activated Experts: 8 Routed + 1 Shared - Expert Intermediate Dimension: 512
Context Length: 262,144 tokens
Disclaimer: No outputs from external API services are included in the training pipeline.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#performancePerformance

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#agentworldbench-open-ended-evaluationAgentWorldBench (Open-Ended Evaluation)

Five-dimensional rubric mean per domain, normalized to 0-100 scale.

ModelMCPSearchTerm.SWEAndroidWebOSOverallGPT-5.470.1037.2653.6966.2960.0051.8068.5858.25Claude Opus 4.854.9335.1459.1864.1061.5054.6666.6256.59Claude Opus 4.669.9029.3057.5164.5561.7451.4270.2057.80Gemini 3.1 Pro59.0730.2152.4759.0761.4052.8366.9254.57Claude Sonnet 4.670.0028.7956.9864.5258.0350.7863.1756.04DeepSeek-V4-Pro63.2727.6151.2659.4455.1750.3263.7052.97GLM-5.167.6022.4647.3252.0759.1051.5059.1351.31Kimi K2.665.2327.4852.5458.7758.9350.2060.8053.42MiniMax-M2.755.8227.3041.6237.4452.4050.5257.7346.12Qwen3.5-35B-A3B57.8725.9846.1347.5853.1847.1056.2747.73Qwen3.5-397B-A17B68.3130.8155.3064.4454.9048.5560.8554.74Qwen3.6-Plus55.2821.9450.5859.0857.6550.7860.3350.81Qwen-AgentWorld-35B-A3B64.7936.6953.9665.6358.1749.5565.9256.39Qwen-AgentWorld-397B-A17B68.2437.8257.7368.4960.2050.9867.8958.71

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#quickstartQuickstart

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#deploymentDeployment

Qwen-AgentWorld-35B-A3B can be served via APIs with popular inference frameworks. In the following, we show example commands to launch OpenAI-compatible API servers.

The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because Qwen-AgentWorld leverages extended context for multi-turn environment simulation, we advise maintaining a context length of at least 128K tokens.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#sglangSGLang

SGLangis a fast serving framework for large language models.

python -m sglang.launch_server \
    --model-path Qwen/Qwen-AgentWorld-35B-A3B \
    --port 8000 \
    --tp-size 4 \
    --context-length 262144 \
    --reasoning-parser qwen3

An OpenAI-compatible API will be available athttp://localhost:8000/v1.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#vllmvLLM

vLLMis a high-throughput and memory-efficient inference engine for LLMs.

vllm serve Qwen/Qwen-AgentWorld-35B-A3B \
    --port 8000 \
    --tensor-parallel-size 4 \
    --max-model-len 262144 \
    --reasoning-parser qwen3 \
    --trust-remote-code

An OpenAI-compatible API will be available athttp://localhost:8000/v1.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#inference-with-transformersInference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen-AgentWorld-35B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": "You are a language world model simulating a Linux terminal environment. "
                   "Given the user's command, predict the terminal output."
    },
    {
        "role": "user",
        "content": "Action: execute_bash\nCommand: ls -la /home/user/project/"
    }
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#using-via-the-chat-completions-apiUsing via the Chat Completions API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",
)

# Terminal domain example
messages = [
    {
        "role": "system",
        "content": "You are a language world model simulating a Linux terminal environment. "
                   "Given the user's command, predict the terminal output."
    },
    {
        "role": "user",
        "content": "Action: execute_bash\nCommand: ls -la /home/user/project/"
    }
]

response = client.chat.completions.create(
    model="Qwen/Qwen-AgentWorld-35B-A3B",
    messages=messages,
    max_tokens=32768,
    temperature=0.6,
)
print(response.choices[0].message.content)

We providedomain-specific world model system prompt templatesinprompts/of the GitHub repository for all 7 domains. These serve as general-purpose system prompts when using Qwen-AgentWorld as an environment simulator. Each domain folder contains asystem\_prompt\.txt(world model system prompt) and ajudge\_system\_prompt\.txt(evaluation prompt).

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#evaluate-on-agentworldbenchEvaluate on AgentWorldBench

AgentWorldBench evaluates language world models by scoring each predicted environment observation on 5 dimensions:Format,Factuality,Consistency,Realism, andQuality.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#setupSetup

# Clone the evaluation repository
git clone https://github.com/QwenLM/Qwen-AgentWorld.git
cd Qwen-AgentWorld

# Download the benchmark
huggingface-cli download Qwen/AgentWorldBench --repo-type dataset --local-dir ./AgentWorldBench

# Install dependencies
pip install openai

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#run-evaluationRun Evaluation

The evaluation follows a three-step pipeline:

cd eval

# Step 1: Run world model inference
python eval.py infer \
    --data-dir ../AgentWorldBench \
    --model-base-url http://localhost:8000/v1 \
    --model-name Qwen/Qwen-AgentWorld-35B-A3B \
    --output-dir ./results

# Step 2: Run LLM judge scoring
export OPENAI_API_KEY="your-api-key"
python eval.py judge \
    --predictions ./results/predictions.jsonl \
    --judge-base-url https://api.openai.com/v1 \
    --judge-model gpt-5.2-2025-12-11 \
    --output-dir ./results

# Step 3: Aggregate and display scores
python eval.py score --predictions ./results/judged.jsonl

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#best-practicesBest Practices

Sampling Parameters: We recommendtemperature=0\.6,top\_p=0\.95,top\_k=20for world model inference. The model uses thinking mode by default (<think\>\.\.\.</think\>) to reason about environment state transitions before producing the predicted observation.
Adequate Output Length: We recommend an output length of 32,768 tokens for most queries. For long, multi-step trajectories, you may increase the max output length to accommodate detailed environment observations.
Domain-Specific System Prompts: For optimal simulation fidelity, use the domain-specific system prompts provided in theprompts/directory of the GitHub repository.

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF#citationCitation

If you find our work helpful, feel free to give us a cite.

@article{zuo2026qwen,
  title={Qwen-agentworld: language world models for general agents},
  author={Zuo, Yuxin and Xiao, Zikai and Sheng, Li and Huang, Fei and Tu, Jianhong and Liu, Yuxuan and Tang, Tianyi and Hu, Xiaomeng and Su, Yang and Lan, Qingfeng and others},
  journal={arXiv preprint arXiv:2606.24597},
  year={2026}
}