Tag
A technical discussion validates TurboQuant performance data on NVIDIA H100 GPUs with FP8 Tensor Cores and promises further insights from non-H100 testing.
Community member repaired dead neurons in Qwen3.6-35B-A3B MoE by copying weights from healthy neighbors, releasing fixed GGUF and FP8 safetensors versions.
Alibaba releases Qwen3.6-35B-A3B-FP8, an open-weight quantized variant of Qwen3.6 with 35B parameters and 3B activated via MoE, featuring improved agentic coding capabilities and thinking preservation for iterative development.
DeepSeek releases DeepGEMM, a high-performance CUDA kernel library for LLM computation primitives including FP8/FP4/BF16 GEMMs, fused MoE with overlapped communication, and MQA scoring, compiled at runtime via JIT with no installation-time CUDA compilation required. The library achieves up to 1550 TFLOPS on H800 and matches or exceeds expert-tuned libraries across various matrix shapes.