gpu-decoding

#gpu-decoding

PivCo-Huffman “merge” operations

Lobsters Hottest ↗ · 6h ago Cached

This blog post analyzes the PivCo-Huffman paper, which introduces 'merge' operations for parallel Huffman decoding, enabling efficient vectorized and GPU-friendly decoding without interleaving overhead.

0 favorites 0 likes

#gpu-decoding

@_avichawla: Anthropic. Google. Meta. Everyone's using an idea from the 1990s to run LLM inference 2-3x faster. In the 1990s, CPU de…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

Speculative decoding, inspired by 1990s CPU branch prediction, is now used by Anthropic, Google, and Meta to speed up LLM inference 2-3x. It uses a small model to guess future tokens and a large model to verify them in parallel, avoiding idle GPU time during decoding.

0 favorites 0 likes

gpu-decoding

PivCo-Huffman “merge” operations

@_avichawla: Anthropic. Google. Meta. Everyone's using an idea from the 1990s to run LLM inference 2-3x faster. In the 1990s, CPU de…

Submit Feedback