@0xSero: Just added 2 new model compressions: Hy3-FP8 & NVFP4 I recommend trying this model it's very strong and fits on 256gb o…

X AI KOLs Following 05/10/26, 12:02 AM Models

model-quantization huggingface tensor-compression nvfp4 tencent gpu-optimization

Summary

0xSero has released new FP8 and NVFP4 quantized versions of the Tencent Hy3-preview model, enabling it to run on 256GB VRAM with full context.

Just added 2 new model compressions: Hy3-FP8 & NVFP4 I recommend trying this model it's very strong and fits on 256gb of vram with full context https://t.co/UQI63BCFiJ

Original Article

View Cached Full Text

Cached at: 05/10/26, 08:23 AM

Just added 2 new model compressions:

Hy3-FP8 & NVFP4

I recommend trying this model it’s very strong and fits on 256gb of vram with full context

https://t.co/UQI63BCFiJ

0xSero/Hy3-preview-NVFP4 · Hugging Face

Source: https://huggingface.co/0xSero/Hy3-preview-NVFP4

https://huggingface.co/0xSero/Hy3-preview-NVFP4#hy3-preview-nvfp4a16Hy3-preview NVFP4A16

This is a checkpoint-onlyNVFP4A16quantization oftencent/Hy3\-preview, produced withllmcompressor\.entrypoints\.model\_free\.model\_free\_ptq.

Base model:tencent/Hy3\-preview
Quantization scheme:NVFP4A16
Ignored modules/patterns:lm\_head, model\.embed\_tokens, re:\.\*router\.gate$, re:\.\*expert\_bias$
Source snapshot: recorded inQUANTIZATION\_MANIFEST\.json
License: inherits Tencent Hy Community License Agreement from the base model; originalLICENSEis included.

https://huggingface.co/0xSero/Hy3-preview-NVFP4#notesNotes

This release quantizes safetensors weights without importing the custom HYV3 model class. Router gates, expert bias tensors, embeddings, and lm_head are preserved unquantized for compatibility/conservatism.

@0xSero: Just added 2 new model compressions: Hy3-FP8 & NVFP4 I recommend trying this model it's very strong and fits on 256gb o…

0xSero/Hy3-preview-NVFP4 · Hugging Face

https://huggingface.co/0xSero/Hy3-preview-NVFP4#hy3-preview-nvfp4a16Hy3-preview NVFP4A16

https://huggingface.co/0xSero/Hy3-preview-NVFP4#notesNotes

Similar Articles

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable

@0xSero: Best models for your hardware this week. 8-12GB - https://huggingface.co/LiquidAI/LFM2.5-8B-A1B… incredible model, so f…

@0xSero: Best models for your hardware - 4gb to 12gb vram - VibeThinker-3B - smokes everything remotely close to its weight clas…

500k context on 48gb VRAM!! - 21tok/s (coding)

@bstnxbt: DFlash v0.1.4 : custom Metal verify kernels for quantized Qwen3 hybrid models, plus significant peak memory reduction a…

Submit Feedback

Similar Articles

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable

@0xSero: Best models for your hardware this week. 8-12GB - https://huggingface.co/LiquidAI/LFM2.5-8B-A1B… incredible model, so f…
A curated weekly roundup of the best AI models for different hardware configurations, from 8GB to 768GB VRAM, highlighting performance and benchmarks.

@0xSero: Best models for your hardware - 4gb to 12gb vram - VibeThinker-3B - smokes everything remotely close to its weight clas…

500k context on 48gb VRAM!! - 21tok/s (coding)

@bstnxbt: DFlash v0.1.4 : custom Metal verify kernels for quantized Qwen3 hybrid models, plus significant peak memory reduction a…