fp8

#fp8

@no_stp_on_snek: appreciate the comprehensive write-up from @_EldarKurtic, @mgoin_, @RedHat_AI on TurboQuant. data on H100 with native F…

X AI KOLs Following ↗ · 2d ago

A technical discussion validates TurboQuant performance data on NVIDIA H100 GPUs with FP8 Tensor Cores and promises further insights from non-H100 testing.

0 favorites 0 likes

#fp8

Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein (neuron level surgery)

Reddit r/LocalLLaMA ↗ · 2026-04-22

Community member repaired dead neurons in Qwen3.6-35B-A3B MoE by copying weights from healthy neighbors, releasing fixed GGUF and FP8 safetensors versions.

0 favorites 0 likes

#fp8

Qwen/Qwen3.6-35B-A3B-FP8

Hugging Face Models Trending ↗ · 2026-04-15 Cached

Alibaba releases Qwen3.6-35B-A3B-FP8, an open-weight quantized variant of Qwen3.6 with 35B parameters and 3B activated via MoE, featuring improved agentic coding capabilities and thinking preservation for iterative development.

0 favorites 0 likes

#fp8

deepseek-ai/DeepGEMM

GitHub Trending (daily) ↗ · 2026-04-21 Cached

DeepSeek releases DeepGEMM, a high-performance CUDA kernel library for LLM computation primitives including FP8/FP4/BF16 GEMMs, fused MoE with overlapped communication, and MQA scoring, compiled at runtime via JIT with no installation-time CUDA compilation required. The library achieves up to 1550 TFLOPS on H800 and matches or exceeds expert-tuned libraries across various matrix shapes.

0 favorites 0 likes

fp8

@no_stp_on_snek: appreciate the comprehensive write-up from @_EldarKurtic, @mgoin_, @RedHat_AI on TurboQuant. data on H100 with native F…

Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein (neuron level surgery)

Qwen/Qwen3.6-35B-A3B-FP8

deepseek-ai/DeepGEMM

Submit Feedback