unsloth

#unsloth

Unsloth GLM-5.2 – How to Run Locally

Hacker News Top ↗ · yesterday Cached

A guide on running Z.ai's open model GLM-5.2 locally using Unsloth Dynamic GGUFs. The model features 744B total parameters (40B active) and a 1M context window, with quantized versions reducing memory to 239GB for 2-bit, enabling local inference on 256GB Macs.

0 favorites 0 likes

#unsloth

Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

Hacker News Top ↗ · 2d ago Cached

A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.

0 favorites 0 likes

#unsloth

@SlimTradeyBaby: Drop your GPU below and I’ll tell you exactly what model and config to run on it. JOKES. No need. Qwen 3.6 27b @Unsloth…

X AI KOLs Timeline ↗ · 4d ago Cached

A tweet promoting the Qwen 3.6 27b model and recommending UnslothAI for running it on any GPU.

0 favorites 0 likes

#unsloth

@10xmylife: Unsloth 成功将 2-bit 版本的 GLM-5.2 部署在了 256GB 的 Mac 上

X AI KOLs Following ↗ · 5d ago Cached

Unsloth 成功将 GLM-5.2 模型以 2-bit 量化压缩至 238GB，可在 256GB Mac 上本地运行，保留约 82% 的准确率。

0 favorites 0 likes

#unsloth

@UnslothAI: GLM-5.2 can now be run locally! The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% siz…

X AI KOLs Timeline ↗ · 6d ago Cached

UnslothAI announces GLM-5.2, Z.ai's strongest open model with 744B parameters, now runnable locally via dynamic GGUF quantization reducing size by ~84% to 239GB while retaining ~82% accuracy. It fits on 256GB Macs and supports long-context, reasoning, and agentic tasks.

0 favorites 0 likes

#unsloth

@aisearchio: GLM 5.2 GGUF is already here! 8-bit is ~half the size of the full model. Smaller versions coming soon https://huggingfa…

X AI KOLs Timeline ↗ · 6d ago Cached

GLM 5.2 GGUF quantized model is released, with 8-bit version half the size of the full model; smaller versions are coming soon.

0 favorites 0 likes

#unsloth

@Sentdex: SITUATION DETECTED: Unsloth quants for GLM 5.2 are landing.

X AI KOLs Following ↗ · 2026-06-17 Cached

Unsloth quantizations for the GLM 5.2 model are being released.

0 favorites 0 likes

#unsloth

@h100envy: Daniel Han wrote Unsloth, the reason half of open-source can fine-tune a model on one GPU instead of a cluster. He didn…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

Daniel Han built Unsloth, a tool that rewrites GPU kernels to make fine-tuning 2-3 times faster on a single GPU, enabling many open-source users to train models without a cluster.

0 favorites 0 likes

#unsloth

unsloth/Kimi-K2.7-Code-GGUF

Hugging Face Models Trending ↗ · 2026-06-12 Cached

Unsloth releases GGUF quantizations of Kimi K2.7 Code, a 1 trillion parameter MoE coding model built on Kimi K2.6 with improved token efficiency and agentic coding capabilities.

0 favorites 0 likes

#unsloth

Unsloth Minimax M3 GGUF

Reddit r/LocalLLaMA ↗ · 2026-06-12

Unsloth is uploading a GGUF quantized version of the MiniMax M3 model to Hugging Face.

0 favorites 0 likes

#unsloth

unsloth/MiniMax-M3-GGUF

Hugging Face Models Trending ↗ · 2026-06-12 Cached

Unsloth releases a GGUF quantized version of the MiniMax-M3 multimodal model, enabling image-text-to-text tasks with support for Transformers, llama.cpp, vLLM, and other inference engines.

0 favorites 0 likes

#unsloth

@Freerunnering: This actually makes Gemma 4 26B-4A usable for a coding agent @ 72tk/s on my MacBook Pro M1 Max. This video is realtime,…

X AI KOLs Timeline ↗ · 2026-06-12 Cached

Unsloth AI announces that Gemma 4 runs 2x faster with MTP GGUFs, making it feasible for local coding agents on hardware like a MacBook Pro M1 Max at 72 tokens/s.

0 favorites 0 likes

#unsloth

@VincentLogic: A 4.66 GB model actually runs at the level of a McKinsey consultant locally? Unsloth's latest 2-bit Gemma 4 12B is truly explosive. This isn't just chat – it directly transforms into a 'Super Agent' working autonomously: autonomously searching online citing 15+ sources, deeply distinguishing…

X AI KOLs Timeline ↗ · 2026-06-12 Cached

Unsloth releases a 2-bit quantized Gemma 4 12B model, only 4.66GB, runnable locally, with capabilities like autonomous online search and deep analysis similar to McKinsey consulting.

0 favorites 0 likes

#unsloth

@neural_avb: Lurking the Reasoning Training docs rn. Time to write a verifiers env and Unsloth/TRL that shit! Video soon if it all g…

X AI KOLs Timeline ↗ · 2026-06-11 Cached

The user is working on implementing reasoning training with verifiers using Unsloth and TRL, reporting progress on locally generating GRPO-like rollouts with a small SLM and a tiny RM, and promises a video soon.

0 favorites 0 likes

#unsloth

unsloth/diffusiongemma-26B-A4B-it-GGUF

Hugging Face Models Trending ↗ · 2026-06-10 Cached

Unsloth releases GGUF quantizations of Google DeepMind's DiffusionGemma (26B-A4B), a new block-diffusion architecture for faster text generation, ready for llama.cpp.

0 favorites 0 likes

#unsloth

Unsloth Gemma 4 QAT MTP assistant models now available

Reddit r/LocalLLaMA ↗ · 2026-06-09

Unsloth released Gemma 4 QAT MTP assistant models as GGUF files on Hugging Face, available in q8_0 and larger quantization formats.

0 favorites 0 likes

#unsloth

Qwen3.6-35B-A3B tool calling benchmark: ByteShape vs. Unsloth GGUFs, KV cache quants & long context performance

Reddit r/LocalLLaMA ↗ · 2026-06-08

A detailed benchmark comparing ByteShape and Unsloth quantizations of Qwen3.6-35B-A3B on tool calling performance, KV cache quantization effects, and long context degradation using llama.cpp and tool-eval-bench.

0 favorites 0 likes

#unsloth

Does it make sense to use alternative quantizations of QAT models? [D]

Reddit r/MachineLearning ↗ · 2026-06-06

A discussion on whether it is sensible to use alternative quantization methods on quantization-aware trained (QAT) models like Gemma-4, questioning if unsloth's benchmarks showing closer performance to QAT fine-tunes are beneficial or counterproductive.

0 favorites 0 likes

#unsloth

Unsloth just dropped MTP GGUF weights for Gemma 4!

Reddit r/LocalLLaMA ↗ · 2026-06-05

Unsloth has released Multi-Token Prediction (MTP) GGUF weights for Gemma 4 models (31B, 26B-A4B, 12B) in Q8, F16, and BF16 precisions, available on Hugging Face.

0 favorites 0 likes

#unsloth

unsloth/gemma-4-12B-it-qat-GGUF

Hugging Face Models Trending ↗ · 2026-06-05 Cached

Unsloth releases GGUF quantized versions of Google DeepMind's Gemma 4 models, optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, supporting multiple formats and sizes for diverse deployment.

0 favorites 0 likes

unsloth

Submit Feedback