gguf

#gguf

Tmax-27b - a Qwen3.6-27b terminal agent for small GPUs trained with DPPO (RL)

Reddit r/LocalLLaMA ↗ · 3h ago

Ai2 released Tmax-27B, a terminal-agent LLM trained with DPPO (RL) on Qwen3.6-27B, and the author provides importance-matrix-calibrated GGUF quantizations that achieve competitive performance on agentic benchmarks even at very low bit-widths, with a grafted MTP draft head for speculative decoding.

0 favorites 0 likes

#gguf

UPDATE: Qwen-27B-IQ4_KS and Qwen-27B-IQ_KS_KT for ik_llama.cpp, especially for NVIDIA with 16GB VRAM

Reddit r/LocalLLaMA ↗ · 4h ago

New GGUF quantizations of Qwen3.6-27B optimized for 16GB VRAM NVIDIA GPUs, including an experimental Trellis variant, with perplexity benchmarks.

0 favorites 0 likes

#gguf

MiniMax-M3-EAGLE3-GGUF - Llama.cpp compatible MiniMax M3 EAGLE draft model!

Reddit r/LocalLLaMA ↗ · 18h ago

A GGUF conversion of MiniMax M3's EAGLE draft model for llama.cpp is now available, enabling speculative decoding speedups on compatible hardware.

0 favorites 0 likes

#gguf

Unsloth GLM-5.2 – How to Run Locally

Hacker News Top ↗ · yesterday Cached

A guide on running Z.ai's open model GLM-5.2 locally using Unsloth Dynamic GGUFs. The model features 744B total parameters (40B active) and a 1M context window, with quantized versions reducing memory to 239GB for 2-bit, enabling local inference on 256GB Macs.

0 favorites 0 likes

#gguf

@KyleHessling1: Morning y'all! We've released Qwopus 3.6 27B-Coder-Compat with some compatibility fixes for various harnesses! This ver…

X AI KOLs Timeline ↗ · yesterday Cached

Qwopus 3.6 27B-Coder-Compat is a new GGUF release with compatibility fixes for various harnesses, reducing looping and improving thinking stability. It can generate full HTML games and is suitable for local deployment.

0 favorites 0 likes

#gguf

Qwen 3.6 27b Abliterated (apostate)

Reddit r/LocalLLaMA ↗ · 2d ago

The user released Apostate, an abliterated version of Qwen 3.6 27B that reduces safety alignment refusal rate from 92% to 7.6% with minimal capability loss (KL 0.120).

0 favorites 0 likes

#gguf

@antirez: First kinda working implementation of GLM 5.2 in DwarfStar. Will take some time to be good enough, but it is a promisin…

X AI KOLs Following ↗ · 2d ago Cached

Antirez reports the first working implementation of GLM 5.2 in DwarfStar, using a 433 GB GGUF file on an M3 Ultra with 512GB RAM, though it needs further refinement.

0 favorites 0 likes

#gguf

Why is AutoRound being slept on so hard?

Reddit r/LocalLLaMA ↗ · 2d ago

A user questions why AutoRound, a quantization tool offering superior accuracy retention at low bits and direct GGUF export, is overlooked despite outperforming standard AWQ and RTN, especially on complex models like Qwen3.6 27B.

0 favorites 0 likes

#gguf

empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF

Hugging Face Models Trending ↗ · 4d ago Cached

Empero AI releases Qwythos-9B-Claude-Mythos-5-1M-GGUF, a 9B parameter reasoning model fine-tuned on 500M+ tokens of Claude Mythos/Fable traces with chain-of-thought, achieving significant gains over Qwen3.5-9B and supporting 1M-token context via YaRN rope-scaling. The GGUF quantizations enable local inference on llama.cpp and compatible runtimes.

0 favorites 0 likes

#gguf

yuxinlu1/gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUF

Hugging Face Models Trending ↗ · 4d ago Cached

A fine-tuned version of Gemma-4-12B, optimized for local coding and agentic tasks, achieving ~3.5x improvement over the base model on the tau2-bench telecom benchmark.

0 favorites 0 likes

#gguf

Calibrating 2-bit GGUFs (<10Gb) for agentic coding tasks

Reddit r/LocalLLaMA ↗ · 5d ago

This article introduces calibrated 2-bit GGUF quantizations of the Qwopus3.6-27B-Coder model for agentic coding tasks, demonstrating that the IQ2_M quant (9.74 GiB) achieves a 63% pass rate on the SWE-rebench benchmark, comparable to a Q5_K_M quant at half the size.

0 favorites 0 likes

#gguf

@MiaAI_lab: I fine-tuned Gemma 4 12B with Fable-5 style reasoning and assistant traces and released it as Gemmable 4 12b. **Availab…

X AI KOLs Timeline ↗ · 5d ago Cached

Mia-AiLab released Gemmable 4 12B, a fine-tuned version of Google's Gemma 4 12B model using Fable-5 style reasoning and assistant traces, available in GGUF and MLX formats for local inference.

0 favorites 0 likes

#gguf

@UnslothAI: GLM-5.2 can now be run locally! The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% siz…

X AI KOLs Timeline ↗ · 5d ago Cached

UnslothAI announces GLM-5.2, Z.ai's strongest open model with 744B parameters, now runnable locally via dynamic GGUF quantization reducing size by ~84% to 239GB while retaining ~82% accuracy. It fits on 256GB Macs and supports long-context, reasoning, and agentic tasks.

0 favorites 0 likes

#gguf

@aisearchio: GLM 5.2 GGUF is already here! 8-bit is ~half the size of the full model. Smaller versions coming soon https://huggingfa…

X AI KOLs Timeline ↗ · 6d ago Cached

GLM 5.2 GGUF quantized model is released, with 8-bit version half the size of the full model; smaller versions are coming soon.

0 favorites 0 likes

#gguf

PSA: unsloth/GLM-5.2-GGUF is uploading

Reddit r/LocalLLaMA ↗ · 6d ago Cached

unsloth has uploaded a GGUF version of GLM-5.2 to Hugging Face, providing ready-to-use model files for various inference engines like llama.cpp, vLLM, and SGLang.

0 favorites 0 likes

#gguf

@DJLougen: Quants here https://huggingface.co/GestaltLabs/Ornstein-3.5-9B-V1.5-GGUF…

X AI KOLs Timeline ↗ · 6d ago Cached

GestaltLabs releases Ornstein-3.5-9B-V1.5 GGUF quantizations, a reasoning-focused fine-tune of Qwen 3.5 9B with an MTP head and vision projector for multimodal use.

0 favorites 0 likes

#gguf

@Ali_TongyiLab: We are pleased to highlight an excellent community model from developer : Qwen3.6-27B-MTP-pi-reasoning-GGUF. Built on o…

X AI KOLs Timeline ↗ · 6d ago Cached

Alibaba's Tongyi Lab highlights a community model, Qwen3.6-27B-MTP-pi-reasoning-GGUF, built on Qwen3.6-27B, optimized for automated programming and debugging workflows for local coding agents.

0 favorites 0 likes

#gguf

bartowski/command-a-plus-05-2026-GGUF · Hugging Face

Reddit r/LocalLLaMA ↗ · 2026-06-16 Cached

GGUF quantized versions of Cohere's command-a-plus-05-2026 model, optimized for llama.cpp and available in various quantization levels for local inference.

0 favorites 0 likes

#gguf

@WaleedAhmad1a10: Check out the Qwen 3.5 27B MoQ GGUFs :

X AI KOLs Following ↗ · 2026-06-16 Cached

A Hugging Face repository (kaitchup/Qwen3.6-27B-GGUF-MoQ) provides GGUF quantized weights for the Qwen3.6-27B MoQ model, enabling local inference with tools like llama.cpp and Ollama.

0 favorites 0 likes

#gguf

Nex-N2 Pro is the real deal

Reddit r/LocalLLaMA ↗ · 2026-06-16

The writer shares their experience with Nex-N2 Pro, originally mistaken as Rio-3.5, and finds it performs exceptionally well on coding benchmarks without hallucination, rivaling GPT-5.x on their Mac setup.

0 favorites 0 likes

gguf

Submit Feedback