cpu-optimization

#cpu-optimization

@venkat_systems: Inference is not just GPU/Accelerator problem. Unoptimized cpu work in hot path can drastically affect performance. v0.…

X AI KOLs Timeline ↗ · 2026-06-19 Cached

Venkat explains that unoptimized CPU work in the hot path can severely impact inference performance, and introduces his PR to mooncake that adds a memory arena for lock-free, allocation-free operations, benefiting vLLM and SGL projects.

0 favorites 0 likes

#cpu-optimization

What it takes to transpose a matrix

Hacker News Top ↗ · 2026-05-24 Cached

An in-depth technical blog post explaining how to efficiently transpose matrices using SIMD instructions on modern x86_64 CPUs, focusing on AVX2 intrinsics like _mm256_shuffle_epi8.

0 favorites 0 likes

#cpu-optimization

ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) by pl752 · Pull Request #21636 · ggml-org/llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-04-21 Cached

Pull request adds optimized x86 and generic CPU q1_0 dot-product kernels to ggml-cpu, improving quantized LLM inference speed.

0 favorites 0 likes

#cpu-optimization

High-performance 2D graphics rendering on the CPU using sparse strips

Lobsters Hottest ↗ · 2026-04-19

Research on optimizing 2D graphics rendering on CPUs using sparse strip techniques to improve performance and reduce memory overhead.

0 favorites 0 likes

cpu-optimization

@venkat_systems: Inference is not just GPU/Accelerator problem. Unoptimized cpu work in hot path can drastically affect performance. v0.…

What it takes to transpose a matrix

ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) by pl752 · Pull Request #21636 · ggml-org/llama.cpp

High-performance 2D graphics rendering on the CPU using sparse strips

Submit Feedback