quantized

#quantized

Holo3.1: Fast & Local Computer Use Agents

Hugging Face Blog ↗ · 2026-06-02 Cached

Holo3.1 is an updated computer-use model family that improves robustness across web, desktop, and mobile environments, introduces quantized checkpoints for local execution, and adds native support for function-calling protocols.

1 favorites 1 likes

#quantized

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face

Reddit r/LocalLLaMA ↗ · 2026-05-30 Cached

NVIDIA releases Qwen3.6-35B-A3B-NVFP4, a quantized version of Alibaba's mixture-of-experts multimodal language model, optimized for deployment on NVIDIA GPUs using Model Optimizer.

0 favorites 0 likes

#quantized

@dealignai: Qwen3.6-27b and 35b MXFP4 MXFP8 CRACK is out now with MTP. Enjoy uncensored speediness! 35b mxfp4: https://huggingface.…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

DealignAI releases CRACK-abliterated and MXFP4/MXFP8 quantized versions of Qwen3.6-27B and 35B models, preserving MTP for faster speculative decoding on Apple Silicon.

0 favorites 0 likes

#quantized

@DeepTechTR: Qwen 3.6 27B is incredibly fast with 16 GB VRAM! The impact of Pure Quant The era of the 27B model that runs seamlessly…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

Qwen 3.6 27B runs fast on 16 GB VRAM thanks to 'Pure Quant' technology, achieving 40 tokens/s with MTP and supporting 64k contexts, enabling local AI on consumer GPUs like RTX 4060 Ti.

0 favorites 0 likes

#quantized

@coffeecup2020: TurboQuant - Qwopus3.6-27B-v2-TQ3_4S.gguf Confirmed with gpqa test this is something great. https://huggingface.co/YTan…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

TurboQuant is a GGUF quantized version of the Qwopus3.6-27B-v2 model, confirmed with GPQA test results and shared on Hugging Face, with credits to Jackrong and KyleHessling.

0 favorites 0 likes

#quantized

@Ex0byt: And... Ladies and Gentlemen: Qwen3.6-27B-PRISM-PRO-DQ - enjoy!

X AI KOLs Timeline ↗ · 2026-05-19 Cached

Release of Qwen3.6-27B-PRISM-PRO-DQ, a dynamically quantized GGUF version of Qwen3.6-27B with bias/propaganda removal, preserving native MTP draft head and vision tower, enabling lossless speculative decoding for faster inference.

0 favorites 0 likes

#quantized

CohereLabs/command-a-plus-05-2026-w4a4

Hugging Face Models Trending ↗ · 2026-05-18 Cached

CohereLabs releases Command A+, an open-source 25B active parameter model optimized for agentic, multilingual, and reasoning tasks, with vision support and Apache 2.0 license.

0 favorites 0 likes

#quantized

@outsource_: NEW GLM+ QWEN 18B RUNS ON CONSUMER GPU IT BEATS 35B MoE AT HALF THE VRAM @KyleHessling1 just dropped the healed Qwopus-…

X AI KOLs Timeline ↗ · 2026-04-20 Cached

A new 18B merged quantized model, Qwopus-GLM-18B-GGUF, outperforms 35B MoE models while using half the VRAM and running on consumer GPUs.

0 favorites 0 likes

#quantized

@rohanpaul_ai: Gemma 4 (specifically its edge-optimized E2B and E4B variants) running fully offline on an iPhone via apps like Locally…

X AI KOLs Following ↗ · 2026-04-19 Cached

Google’s Gemma 4 E2B/E4B quantized variants now run fully offline on iPhone via apps like Locally AI, leveraging the Apple Neural Engine for on-device inference.

0 favorites 0 likes

#quantized

Jiunsong/supergemma4-26b-uncensored-gguf-v2

Hugging Face Models Trending ↗ · 2026-04-11 Cached

SuperGemma4-26B-Uncensored-Fast GGUF v2 is a quantized, locally-runnable variant of Google's Gemma-4-26B model optimized for Apple Silicon, offering faster inference speeds and less-censored chat behavior while maintaining practical performance on general tasks.

0 favorites 0 likes

#quantized

Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2

Hugging Face Models Trending ↗ · 2026-04-10 Cached

SuperGemma4-26B-Uncensored-MLX-4bit-v2 is a fine-tuned and quantized variant of Google's Gemma 4 26B optimized for Apple Silicon, offering improved performance on code, reasoning, and tool-use tasks while maintaining faster inference speeds compared to the stock baseline.

0 favorites 0 likes

quantized

Submit Feedback