kernel-tuning

#kernel-tuning

Optimizing Models to Be Fast at Codegen (8 minute read)

TLDR AI ↗ · 6d ago Cached

Morph LLC describes three key techniques—training a speculator on coding output, auto-searching kernels on cheap GPUs, and writing a custom interconnect—to dramatically speed up open models like Qwen and DeepSeek for coding agent workloads, achieving up to 3x speculative decoding speedup and 97-162 tok/s on a $7K GPU.

0 favorites 0 likes

#kernel-tuning

@PyTorch: Autotuning is the backbone of Helion, PyTorch's DSL for performance portable ML kernels. Currently Helion searches util…

X AI KOLs Following ↗ · 2026-06-18 Cached

This blog explores using LLM-guided autotuning to accelerate kernel configuration search in PyTorch's Helion DSL, replacing the slower Likelihood-Free Bayesian Optimization approach.

0 favorites 0 likes

kernel-tuning

Optimizing Models to Be Fast at Codegen (8 minute read)

@PyTorch: Autotuning is the backbone of Helion, PyTorch's DSL for performance portable ML kernels. Currently Helion searches util…

Submit Feedback