4-bit

#4-bit

@superalesha: Don't dare bury RTX 3090 until you read this! @UnslothAI shipped two new 4-bit quants of qwen3.6-35b this week. i spent…

X AI KOLs Timeline ↗ · 2026-07-11 Cached

A benchmark comparison of nvfp4, nvfp4-fast, and AWQ 4-bit quantizations of Qwen3.6-35B on RTX 3090s shows similar performance, with the MTP head trick boosting throughput by 41%.

0 favorites 0 likes

#4-bit

4-bit GLM-5.2 (753B MoE) on 4× DGX Spark: 70.8% on Terminal-Bench 2.1 vs 81.0% for the full model

Reddit r/LocalLLaMA ↗ · 2026-07-08

Running a 4-bit quantized version of GLM-5.2 (753B MoE) on 4 DGX Spark machines achieves 70.8% on Terminal-Bench 2.1, compared to 81.0% from the full model.

0 favorites 0 likes

#4-bit

FourTune: Towards Fully 4-Bit Efficient Post-Training for Diffusion Models

arXiv cs.LG ↗ · 2026-07-08 Cached

FourTune proposes a fully 4-bit quantization framework (W4A4G4) for efficient post-training of diffusion models, using a triple-branch hybrid pipeline and custom fused kernels to reduce memory by 2.25× and increase throughput by 2.27× on 12B FLUX.1-dev without quality loss.

0 favorites 0 likes

#4-bit

@no_stp_on_snek: Pretty dang good. ran some tests myself. it's a pretty good model IMO:

X AI KOLs Following ↗ · 2026-07-02 Cached

NVIDIA's 4-bit quantized Qwen3.6-27B model (NVFP4) is reported to be near-lossless, maintaining full-size quality at a quarter the size.

0 favorites 0 likes

#4-bit

@no_stp_on_snek: ok folks you know the drill.. verdict up front: NVIDIA's 4-bit Qwen3.6-27B (NVFP4) is near-lossless. on my own held-out…

X AI KOLs Following ↗ · 2026-07-02 Cached

NVIDIA's 4-bit quantized Qwen3.6-27B (NVFP4) is found to be near-lossless compared to the full bf16 model, with behavioral differences being minor and random rather than systematic, making it a practical drop-in replacement.

0 favorites 0 likes

#4-bit

Deepseek V4 Flash 2, 3 and 4 bits GGUFs

Reddit r/LocalLLaMA ↗ · 2026-07-01 Cached

GGUF quantizations of DeepSeek V4 Flash in 2-bit, 3-bit, and 4-bit precisions, made available on Hugging Face for local inference with tools like llama.cpp and Ollama.

0 favorites 0 likes

#4-bit

Guide to the TD4 4-bit DIY CPU

Hacker News Top ↗ · 2026-06-18 Cached

A detailed guide on building and understanding the TD4 4-bit DIY CPU kit from Aliexpress, covering soldering, schematics, and operation principles.

0 favorites 0 likes

#4-bit

Alpie Core 32B, 4 bit any real agent workflow tests or just vendor benchmarks?

Reddit r/AI_Agents ↗ · 2026-06-11

The article questions the validity of vendor benchmarks for Alpie Core 32B, a 4-bit reasoning coding model optimized for low VRAM and agent workflows, noting a lack of independent benchmark replication.

0 favorites 0 likes

#4-bit

@UnslothAI: 4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt. Try this locally on 20GB RAM via Unsloth Studi…

X AI KOLs Timeline ↗ · 2026-05-19 Cached

UnslothAI announces that its 4-bit Qwen3.6 MTP GGUF model can search over 70 websites from a single prompt, running locally on 20GB RAM via Unsloth Studio. The update adds automatic MTP and speculative decoding support.

0 favorites 0 likes

#4-bit

Introducing cyankiwi AWQ 4-bit Quantization — 26.05 update

Reddit r/LocalLLaMA ↗ · 2026-05-14

Cyankiwi introduced an updated version of their AWQ 4-bit quantization method that jointly optimizes scales and quantization ranges, achieving lower KL divergence than existing methods on Llama-3 models.

0 favorites 0 likes

4-bit

Submit Feedback