@UnslothAI: 4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt. Try this locally on 20GB RAM via Unsloth Studi…

X AI KOLs Timeline 05/19/26, 04:20 PM Tools

4-bit qwen gguf local-llm speculative-decoding unsloth

Summary

UnslothAI announces that its 4-bit Qwen3.6 MTP GGUF model can search over 70 websites from a single prompt, running locally on 20GB RAM via Unsloth Studio. The update adds automatic MTP and speculative decoding support.

4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt. Try this locally on 20GB RAM via Unsloth Studio. Unsloth now supports auto MTP + speculative decoding & auto-selects the best MTP settings for your device (Mac, CPU, GPU). GitHub: https://t.co/aZWYAtakBP https://t.co/DwGVNMCfRV

Original Article

View Cached Full Text

Cached at: 05/20/26, 02:24 AM

4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt.

Try this locally on 20GB RAM via Unsloth Studio.

Unsloth now supports auto MTP + speculative decoding & auto-selects the best MTP settings for your device (Mac, CPU, GPU).

GitHub: https://t.co/aZWYAtakBP https://t.co/DwGVNMCfRV

unslothai/unsloth

Source: https://github.com/unslothai/unsloth

Unsloth Studio lets you run and train models locally.

Features • Quickstart • Notebooks • Documentation

⚡ Get started

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Community:

⭐ Features

Unsloth Studio (Beta) lets you run and train text, audio, embedding, vision models on Windows, Linux and macOS.

Inference

Search + download + run models including GGUF, LoRA adapters, safetensors
Export models: Save or export models to GGUF, 16-bit safetensors and other formats.
Tool calling: Support for self-healing tool calling and web search
Code execution: lets LLMs test code in Claude artifacts and sandbox environments
API inference endpoint: Deploy and run local LLMs in Claude Code, Codex tools with Unsloth
Auto set inference settings and customize chat templates.
We work directly with teams behind gpt-oss, Qwen3, Llama 4, Mistral, Gemma 1-3, and Phi-4, where we’ve fixed bugs that improve model accuracy.
Upload images, audio, PDFs, code, DOCX and more file types to chat with.

Training

Train and RL 500+ models up to 2x faster with up to 70% less VRAM, with no accuracy loss.
Custom Triton and mathematical kernels. See some collabs we did with PyTorch and Hugging Face.
Data Recipes: Auto-create datasets from PDF, CSV, DOCX etc. Edit data in a visual-node workflow.
Reinforcement Learning (RL): The most efficient RL library, using 80% less VRAM for GRPO, FP8 etc.
Supports full fine-tuning, RL, pretraining, 4-bit, 16-bit and, FP8 training.
Observability: Monitor training live, track loss and GPU usage and customize graphs.
Multi-GPU training is supported, with major improvements coming soon.

📥 Install

Unsloth can be used in two ways: through Unsloth Studio, the web UI, or through Unsloth Core, the code-based version. Each has different requirements.

Unsloth Studio (web UI)

Unsloth Studio (Beta) works on Windows, Linux, WSL and macOS.

CPU: Supported for Chat and Data Recipes currently
NVIDIA: Training works on RTX 30/40/50, Blackwell, DGX Spark, Station and more
macOS: Currently supports chat and Data Recipes. MLX training is coming very soon
AMD: Chat + Data works. Train with Unsloth Core. Studio support is out soon.
Coming soon: Training support for Apple MLX, AMD, and Intel.
Multi-GPU: Available now, with a major upgrade on the way

macOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Launch

unsloth studio -p 8888

For cloud VMs or LAN access, add -H 0.0.0.0 to bind on all interfaces.

Update

To update, use the same install commands as above. Or run (does not work on Windows):

unsloth studio update

Docker

Use our Docker image unsloth/unsloth container. Run:

docker run -d -e JUPYTER_PASSWORD="mypassword" \
  -p 8888:8888 -p 8000:8000 -p 2222:22 \
  -v $(pwd)/work:/workspace/work \
  --gpus all \
  unsloth/unsloth

Developer, Nightly, Uninstall

To see developer, nightly and uninstallation etc. instructions, see advanced installation.

Unsloth Core (code-based)

Linux, WSL:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv unsloth_env --python 3.13
source unsloth_env/bin/activate
uv pip install unsloth --torch-backend=auto

Windows:

winget install -e --id Python.Python.3.13
winget install --id=astral-sh.uv  -e
uv venv unsloth_env --python 3.13
.\unsloth_env\Scripts\activate
uv pip install unsloth --torch-backend=auto

For Windows, pip install unsloth works only if you have PyTorch installed. Read our Windows Guide. You can use the same Docker image as Unsloth Studio.

AMD, Intel:

For RTX 50x, B200, 6000 GPUs: uv pip install unsloth --torch-backend=auto. Read our guides for: Blackwell and DGX Spark.
To install Unsloth on AMD and Intel GPUs, follow our AMD Guide and Intel Guide.

📒 Free Notebooks

Train for free with our notebooks. You can use our new free Unsloth Studio notebook to run and train models for free in a web UI. Read our guide. Add dataset, run, then deploy your trained model.

Model	Free Notebooks	Performance	Memory use
Gemma 4 (E2B)	▶️ Start for free	1.5x faster	50% less
Qwen3.5 (4B)	▶️ Start for free	1.5x faster	60% less
gpt-oss (20B)	▶️ Start for free	2x faster	70% less
Qwen3.5 GSPO	▶️ Start for free	2x faster	70% less
gpt-oss (20B): GRPO	▶️ Start for free	2x faster	80% less
Qwen3: Advanced GRPO	▶️ Start for free	2x faster	70% less
embeddinggemma (300M)	▶️ Start for free	2x faster	20% less
Mistral Ministral 3 (3B)	▶️ Start for free	1.5x faster	60% less
Llama 3.1 (8B) Alpaca	▶️ Start for free	2x faster	70% less
Llama 3.2 Conversational	▶️ Start for free	2x faster	70% less
Orpheus-TTS (3B)	▶️ Start for free	1.5x faster	50% less

See all our notebooks for: Kaggle, GRPO, TTS, embedding & Vision
See all our models and all our notebooks
See detailed documentation for Unsloth here

🦥 Unsloth News

API inference endpoint: Deploy and run local LLMs in Claude Code, Codex tools. Guide
Qwen3.6: Qwen3.6-35B-A3B can now be trained and run in Unsloth Studio. Blog
Gemma 4: Run and train Google’s new models directly in Unsloth. Blog
Introducing Unsloth Studio: our new web UI for running and training LLMs. Blog
Qwen3.5 - 0.8B, 2B, 4B, 9B, 27B, 35-A3B, 112B-A10B are now supported. Guide + notebooks
Train MoE LLMs 12x faster with 35% less VRAM - DeepSeek, GLM, Qwen and gpt-oss. Blog
Embedding models: Unsloth now supports ~1.8-3.3x faster embedding fine-tuning. Blog • Notebooks
New 7x longer context RL vs. all other setups, via our new batching algorithms. Blog
New RoPE & MLP Triton Kernels & Padding Free + Packing: 3x faster training & 30% less VRAM. Blog
500K Context: Training a 20B model with >500K context is now possible on an 80GB GPU. Blog
FP8 & Vision RL: You can now do FP8 & VLM GRPO on consumer GPUs. FP8 Blog • Vision RL

📥 Advanced Installation

The below advanced instructions are for Unsloth Studio. For Unsloth Core advanced installation, view our docs.

Developer installs: macOS, Linux, WSL:

git clone https://github.com/unslothai/unsloth
cd unsloth
./install.sh --local
unsloth studio -p 8888

Then to update :

unsloth studio update

Developer installs: Windows PowerShell:

git clone https://github.com/unslothai/unsloth.git
cd unsloth
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 --local
unsloth studio -p 8888

Then to update :

unsloth studio update

Nightly: MacOS, Linux, WSL:

git clone https://github.com/unslothai/unsloth
cd unsloth
git checkout nightly
./install.sh --local
unsloth studio -p 8888

Then to launch every time:

unsloth studio -p 8888

Nightly: Windows:

Run in Windows Powershell:

git clone https://github.com/unslothai/unsloth.git
cd unsloth
git checkout nightly
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\install.ps1 --local
unsloth studio -p 8888

Then to launch every time:

unsloth studio -p 8888

Uninstall

The recommended way to fully remove Unsloth Studio is the matching uninstall script for your OS. It stops any running servers, removes the install dir, the launcher data dir, the desktop shortcut, and any platform-specific entries (macOS .app bundle + Launch Services on Mac; Start Menu, HKCU\Software\Unsloth registry key and user PATH entries on Windows):

MacOS, WSL, Linux: curl -fsSL https://unsloth.ai/uninstall.sh | sh
Windows (PowerShell): irm https://unsloth.ai/uninstall.ps1 | iex

If you only want to drop the install dir and keep the launcher/shortcut for a later reinstall, you can instead run rm -rf ~/.unsloth/studio (Mac/Linux/WSL) or Remove-Item -Recurse -Force "$HOME\.unsloth\studio" (Windows). The model cache at ~/.cache/huggingface is not touched by any of these.

For more info, see our docs.

Deleting model files

You can delete old model files either from the bin icon in model search or by removing the relevant cached model folder from the default Hugging Face cache directory. By default, HF uses:

MacOS, Linux, WSL: ~/.cache/huggingface/hub/
Windows: %USERPROFILE%\.cache\huggingface\hub\

💚 Community and Links

Type	Links
Discord	Join Discord server
r/unsloth Reddit	Join Reddit community
📚 Documentation & Wiki	Read Our Docs
Twitter (aka X)	Follow us on X
🔮 Our Models	Unsloth Catalog
✍️ Blog	Read our Blogs

Citation

You can cite the Unsloth repo as follows:

@software{unsloth,
  author = {Daniel Han, Michael Han and Unsloth team},
  title = {Unsloth},
  url = {https://github.com/unslothai/unsloth},
  year = {2023}
}

If you trained a model with 🦥Unsloth, you can use this cool sticker!

License

Unsloth uses a dual-licensing model of Apache 2.0 and AGPL-3.0. The core Unsloth package remains licensed under Apache 2.0, while certain optional components, such as the Unsloth Studio UI are licensed under the open-source license AGPL-3.0.

This structure helps support ongoing Unsloth development while keeping the project open source and enabling the broader ecosystem to continue growing.

Thank You to

The llama.cpp library that lets users run and save models with Unsloth
The Hugging Face team and their libraries: transformers and TRL
The Pytorch and Torch AO team for their contributions
NVIDIA for their NeMo DataDesigner library and their contributions
And of course for every single person who has contributed or has used Unsloth!

unslothai/unsloth

Unsloth Studio lets you run and train models locally.

⚡ Get started

macOS, Linux, WSL:

Windows:

Community:

⭐ Features

Inference

Training

📥 Install

Unsloth Studio (web UI)

macOS, Linux, WSL:

Windows:

Launch

Update

Docker

Developer, Nightly, Uninstall

Unsloth Core (code-based)

Linux, WSL:

Windows:

AMD, Intel:

📒 Free Notebooks

🦥 Unsloth News

📥 Advanced Installation

Developer installs: macOS, Linux, WSL:

Developer installs: Windows PowerShell:

Nightly: MacOS, Linux, WSL:

Nightly: Windows:

Uninstall

Deleting model files

💚 Community and Links

Citation

License

Thank You to

Similar Articles

@TeksEdge: Unsloth released the fastest Qwen3.6-27B MTP GGUF I've tested. Time to upgrade. Compared to the previous GGUF, Q4/Q6 XL…

@Italianclownz: Tested MTP, TriAttention, TurboQuant on @UnslothAI @Alibaba_Qwen Qwen 3.6 35B A3B MTP MXFP4_MoE on @huggingface @no_stp…

MTP on Unsloth

unsloth/Qwen3.6-27B-MTP-GGUF

havenoammo/Qwen3.6-27B-MTP-UD-GGUF

Submit Feedback