Tag
Morph LLC describes three key techniques—training a speculator on coding output, auto-searching kernels on cheap GPUs, and writing a custom interconnect—to dramatically speed up open models like Qwen and DeepSeek for coding agent workloads, achieving up to 3x speculative decoding speedup and 97-162 tok/s on a $7K GPU.
This blog explores using LLM-guided autotuning to accelerate kernel configuration search in PyTorch's Helion DSL, replacing the slower Likelihood-Free Bayesian Optimization approach.