unsloth

#unsloth

Unsloth on Apple Silicon- Pre-announcement announcement

Reddit r/LocalLLaMA ↗ · 2026-06-04

Unsloth, a popular LLM fine-tuning library, announces upcoming support for Apple Silicon devices, expanding its optimization capabilities beyond NVIDIA GPUs.

0 favorites 0 likes

#unsloth

@UnslothAI: We made a guide on using MCP with local LLMs. Connect Qwen3.6 and Gemma 4 for controlled access to tools, files, APIs, …

X AI KOLs Timeline ↗ · 2026-06-01 Cached

A step-by-step guide on using MCP servers with local LLMs like Qwen3.6 and Gemma 4 via Unsloth and llama.cpp, enabling private automated workflows with tools, files, and APIs.

0 favorites 0 likes

#unsloth

@neural_avb: Next video is on training tiny (<1B) models for preference tuning. Plus how to generate preference datasets with local …

X AI KOLs Timeline ↗ · 2026-05-26 Cached

Announces an upcoming video on training tiny models for preference tuning, covering reward models, RLHF, DPO, ORPO with Unsloth and TRL.

0 favorites 0 likes

#unsloth

@UnslothAI: 4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt. Try this locally on 20GB RAM via Unsloth Studi…

X AI KOLs Timeline ↗ · 2026-05-19 Cached

UnslothAI announces that its 4-bit Qwen3.6 MTP GGUF model can search over 70 websites from a single prompt, running locally on 20GB RAM via Unsloth Studio. The update adds automatic MTP and speculative decoding support.

0 favorites 0 likes

#unsloth

@populartourist: Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090. That's coming up from 45-50 t/s without MTP. That's i…

X AI KOLs Timeline ↗ · 2026-05-16 Cached

Unsloth Qwen3.6 27B Q6_K achieves over 100 tokens per second with MTP on RTX 5090, up from 45-50 t/s without MTP.

0 favorites 0 likes

#unsloth

Who is your favourite quant publisher and why?

Reddit r/LocalLLaMA ↗ · 2026-05-13

A user shares their preference for Unsloth quantized models due to fast releases and low perplexity, compares them with Apex MoE quants, and asks the community for their favorite quant publisher.

0 favorites 0 likes

#unsloth

@_lewtun: You can now have an AI researcher running on your laptop 24/7 for free! Running Qwen3-35B-A3B with llama.cpp and a 4-bi…

X AI KOLs Timeline ↗ · 2026-05-13 Cached

The article highlights the ability to run Qwen3-35B-A3B locally on a laptop for free using llama.cpp and Unsloth 4-bit quantization.

0 favorites 0 likes

#unsloth

@billtheinvestor: You can now fine-tune Google's Gemma 4 for free directly in your browser. Simply open the Unsloth Colab notebook, select your model and dataset, and click start. The barrier to customizing models has dropped to zero.

X AI KOLs Timeline ↗ · 2026-05-12 Cached

The tweet announces that users can now fine-tune Google's Gemma 4 model for free in the browser using the Unsloth Colab notebook, significantly lowering the barrier to entry for model customization.

0 favorites 0 likes

#unsloth

@TeksEdge: Unsloth released the fastest Qwen3.6-27B MTP GGUF I've tested. Time to upgrade. Compared to the previous GGUF, Q4/Q6 XL…

X AI KOLs Timeline ↗ · 2026-05-12

Unsloth has released an optimized GGUF version of the Qwen3.6-27B MTP model, achieving significantly faster inference speeds (up to 114 tok/s on an RTX 5090) compared to previous quantizations.

0 favorites 0 likes

#unsloth

@Italianclownz: Tested MTP, TriAttention, TurboQuant on @UnslothAI @Alibaba_Qwen Qwen 3.6 35B A3B MTP MXFP4_MoE on @huggingface @no_stp…

X AI KOLs Following ↗ · 2026-05-12 Cached

A user benchmarks MTP, TriAttention, and TurboQuant optimizations on Qwen 3.6 35B using Unsloth on consumer hardware, finding TurboQuant to be the most effective.

0 favorites 0 likes

#unsloth

@port_dev: https://x.com/port_dev/status/2054259445732110408

X AI KOLs Timeline ↗ · 2026-05-12 Cached

The article provides a detailed tutorial on setting up a local coding agent using Qwen3.6-27B via Unsloth Studio and the Pi coding harness. It highlights the benefits of using GGUF quantized models for efficient inference on consumer hardware like Apple Silicon Macs.

0 favorites 0 likes

#unsloth

MTP on Unsloth

Reddit r/LocalLLaMA ↗ · 2026-05-11

Unsloth releases GGUF-quantized versions of Qwen3.6 models with Multi Token Prediction (MTP) support.

0 favorites 0 likes

#unsloth

unsloth/Qwen3.6-35B-A3B-MTP-GGUF

Hugging Face Models Trending ↗ · 2026-05-11 Cached

This article announces the release of the Qwen3.6-35B-A3B model weights on Hugging Face, optimized by Unsloth with Multi-Token Prediction (MTP) for faster generation via llama.cpp. It highlights improvements in agentic coding capabilities, tool calling, and reasoning context preservation.

0 favorites 0 likes

#unsloth

unsloth/Qwen3.6-27B-MTP-GGUF

Hugging Face Models Trending ↗ · 2026-05-11 Cached

Unsloth has released GGUF weights for the Qwen3.6-27B model, featuring Multi-Token Prediction (MTP) for faster generation and enhanced agentic coding capabilities.

0 favorites 0 likes

#unsloth

@Suryanshti777: NVIDIA just revealed the hidden tricks they’re using to make LLM fine-tuning dramatically faster. Not new GPUs. Not big…

X AI KOLs Timeline ↗ · 2026-05-07

NVIDIA and Unsloth have published a technical guide detailing three low-level optimizations that can accelerate LLM fine-tuning by up to 25%, including packed-sequence caching, double-buffered checkpointing, and optimized MoE routing. The guide provides deep systems-level explanations and benchmarks aimed at ML engineers and developers.

0 favorites 0 likes

#unsloth

havenoammo/Qwen3.6-27B-MTP-UD-GGUF

Hugging Face Models Trending ↗ · 2026-05-06 Cached

This Hugging Face repository provides GGUF files for Qwen3.6-27B with Multi-Token Prediction (MTP) layers grafted onto Unsloth UD XL quantizations. It includes instructions for building llama.cpp with MTP support to enable speculative decoding.

0 favorites 0 likes

#unsloth

Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF

Hugging Face Models Trending ↗ · 2026-04-29 Cached

This entry describes Qwen3.5-9B-DeepSeek-V4-Flash, a distilled AI model that transfers reasoning capabilities from DeepSeek-V4 into a smaller 9B parameter space for efficient inference.

0 favorites 0 likes

#unsloth

Qwen 3.6 is actually useful for vibe-coding, and way cheaper than Claude

Reddit r/LocalLLaMA ↗ · 2026-04-23

User demonstrates Qwen 3.6 27B/35B running locally with llama-server cuts Claude Code API costs from $142 to <$4 for 8-hour vibe-coding session, achieving 30-day payback on $4500 dual-RTX 3090 rig.

0 favorites 0 likes

#unsloth

unsloth/Qwen3.6-27B-GGUF

Hugging Face Models Trending ↗ · 2026-04-22 Cached

Unsloth releases a GGUF quantized version of the Qwen3.6-27B model, featuring improved agentic coding capabilities, tool calling, and support for Unsloth Studio.

0 favorites 0 likes

#unsloth

Kimi K2.6 Unsloth GGUF is out

Reddit r/LocalLLaMA ↗ · 2026-04-21

Unsloth has released a GGUF-quantized version of the Kimi K2.6 model, enabling efficient local inference.

0 favorites 0 likes

unsloth

Submit Feedback