quantized

Tag

Cards List
#quantized

@BrianRoemmele: BOOM! Meet the open source Cambrian Explosion of repulsion of Anthropic! Meet Qwythos 9B, a Qwen3.5 based GGUF that's b…

X AI KOLs Timeline · 2d ago Cached

Qwythos 9B is a new open-source, uncensored reasoning model based on Qwen3.5, offering GGUF quantizations, 1 million token context, vision, and function calling, with significant performance improvements over the base model.

0 favorites 0 likes
#quantized

huihui-ai/Huihui-GLM-5.2-abliterated-GGUF

Hugging Face Models Trending · 2d ago Cached

A quantized GGUF version of the abliterated GLM-5.2 model is released on Hugging Face, enabling local inference with various tools like Transformers, llama.cpp, and vLLM.

0 favorites 0 likes
#quantized

@support_huihui: New GGUF: huihui-ai/Huihui-Qwythos-9B-Claude-Mythos-5-1M-abliterated-GGUF This is an uncensored version of empero-ai/Qw…

X AI KOLs Timeline · 5d ago Cached

A new uncensored GGUF quantized version of the Qwythos-9B-Claude-Mythos-5-1M model, created using abliteration, is released on Hugging Face.

0 favorites 0 likes
#quantized

unsloth/Qwen-AgentWorld-35B-A3B-GGUF

Hugging Face Models Trending · 5d ago Cached

Unsloth released a GGUF quantization of Qwen-AgentWorld-35B-A3B, a native language world model that simulates agentic environments across seven domains (MCP, Search, Terminal, SWE, Android, Web, OS) using long chain-of-thought reasoning and trained via CPT, SFT, and RL.

0 favorites 0 likes
#quantized

@antirez: Based on what I'm saying with GLM 5.2 implementation inside DwarfStar, there is 90% of probability I'll merge the branc…

X AI KOLs Following · 6d ago

Antirez announces high probability of merging a branch implementing GLM 5.2 in DwarfStar, which could become the best model for 512GB Mac Studio and potentially run on distributed 128GB MacBooks with 2-bit quantization.

0 favorites 0 likes
#quantized

nvidia/GLM-5.2-NVFP4

Hugging Face Models Trending · 2026-06-22 Cached

NVIDIA released GLM-5.2-NVFP4, a quantized version of ZAI's GLM-5.2 MoE language model optimized for inference on NVIDIA Blackwell GPUs using Model Optimizer.

0 favorites 0 likes
#quantized

PSA: unsloth/GLM-5.2-GGUF is uploading

Reddit r/LocalLLaMA · 2026-06-17 Cached

unsloth has uploaded a GGUF version of GLM-5.2 to Hugging Face, providing ready-to-use model files for various inference engines like llama.cpp, vLLM, and SGLang.

0 favorites 0 likes
#quantized

@DJLougen: Quants here https://huggingface.co/GestaltLabs/Ornstein-3.5-9B-V1.5-GGUF…

X AI KOLs Timeline · 2026-06-17 Cached

GestaltLabs releases Ornstein-3.5-9B-V1.5 GGUF quantizations, a reasoning-focused fine-tune of Qwen 3.5 9B with an MTP head and vision projector for multimodal use.

0 favorites 0 likes
#quantized

@WaleedAhmad1a10: Check out the Qwen 3.5 27B MoQ GGUFs :

X AI KOLs Following · 2026-06-16 Cached

A Hugging Face repository (kaitchup/Qwen3.6-27B-GGUF-MoQ) provides GGUF quantized weights for the Qwen3.6-27B MoQ model, enabling local inference with tools like llama.cpp and Ollama.

0 favorites 0 likes
#quantized

Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF

Hugging Face Models Trending · 2026-06-11 Cached

A GGUF quantized version of the Qwopus3.6-27B-Coder-MTP model is released on Hugging Face, optimized for local inference and compatible with Transformers, vLLM, SGLang, and Unsloth Studio.

0 favorites 0 likes
#quantized

Holo3.1: Fast & Local Computer Use Agents

Hugging Face Blog · 2026-06-02 Cached

Holo3.1 is an updated computer-use model family that improves robustness across web, desktop, and mobile environments, introduces quantized checkpoints for local execution, and adds native support for function-calling protocols.

1 favorites 1 likes
#quantized

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face

Reddit r/LocalLLaMA · 2026-05-30 Cached

NVIDIA releases Qwen3.6-35B-A3B-NVFP4, a quantized version of Alibaba's mixture-of-experts multimodal language model, optimized for deployment on NVIDIA GPUs using Model Optimizer.

0 favorites 0 likes
#quantized

@dealignai: Qwen3.6-27b and 35b MXFP4 MXFP8 CRACK is out now with MTP. Enjoy uncensored speediness! 35b mxfp4: https://huggingface.…

X AI KOLs Timeline · 2026-05-24 Cached

DealignAI releases CRACK-abliterated and MXFP4/MXFP8 quantized versions of Qwen3.6-27B and 35B models, preserving MTP for faster speculative decoding on Apple Silicon.

0 favorites 0 likes
#quantized

@DeepTechTR: Qwen 3.6 27B is incredibly fast with 16 GB VRAM! The impact of Pure Quant The era of the 27B model that runs seamlessly…

X AI KOLs Timeline · 2026-05-24 Cached

Qwen 3.6 27B runs fast on 16 GB VRAM thanks to 'Pure Quant' technology, achieving 40 tokens/s with MTP and supporting 64k contexts, enabling local AI on consumer GPUs like RTX 4060 Ti.

0 favorites 0 likes
#quantized

@coffeecup2020: TurboQuant - Qwopus3.6-27B-v2-TQ3_4S.gguf Confirmed with gpqa test this is something great. https://huggingface.co/YTan…

X AI KOLs Timeline · 2026-05-23 Cached

TurboQuant is a GGUF quantized version of the Qwopus3.6-27B-v2 model, confirmed with GPQA test results and shared on Hugging Face, with credits to Jackrong and KyleHessling.

0 favorites 0 likes
#quantized

@Ex0byt: And... Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy!

X AI KOLs Timeline · 2026-05-19 Cached

Release of Qwen3.6-27B-PRISM-PRO-DQ, a dynamically quantized GGUF version of Qwen3.6-27B with bias/propaganda removal, preserving native MTP draft head and vision tower, enabling lossless speculative decoding for faster inference.

0 favorites 0 likes
#quantized

CohereLabs/command-a-plus-05-2026-w4a4

Hugging Face Models Trending · 2026-05-18 Cached

CohereLabs releases Command A+, an open-source 25B active parameter model optimized for agentic, multilingual, and reasoning tasks, with vision support and Apache 2.0 license.

0 favorites 0 likes
#quantized

DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF

Hugging Face Models Trending · 2026-05-01 Cached

DavidAU releases a custom 40B parameter model based on Qwen 3.6, expanded and fine-tuned with Claude 4.6 Opus distill and Deckard datasets, featuring optimized GGUF quantizations for improved precision and uncensored capabilities.

0 favorites 0 likes
#quantized

@outsource_: NEW GLM+ QWEN 18B RUNS ON CONSUMER GPU IT BEATS 35B MoE AT HALF THE VRAM @KyleHessling1 just dropped the healed Qwopus-…

X AI KOLs Timeline · 2026-04-20 Cached

A new 18B merged quantized model, Qwopus-GLM-18B-GGUF, outperforms 35B MoE models while using half the VRAM and running on consumer GPUs.

0 favorites 0 likes
#quantized

@rohanpaul_ai: Gemma 4 (specifically its edge-optimized E2B and E4B variants) running fully offline on an iPhone via apps like Locally…

X AI KOLs Following · 2026-04-19 Cached

Google’s Gemma 4 E2B/E4B quantized variants now run fully offline on iPhone via apps like Locally AI, leveraging the Apple Neural Engine for on-device inference.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback