Blackwell and PDL performance increase
Summary
Llama.cpp now supports Nvidia's Programmatic Dependent Launch (PDL) for Blackwell GPUs, offering a 5-10% performance boost on token generation. The feature is not enabled by default and requires a build flag.
Similar Articles
Build 9254 fixes my TG regression and adds PDL for NVIDIA GPUs
Build 9254 of llama.cpp fixes a token generation regression and adds Programmatic Dependent Launch (PDL) support for NVIDIA GPUs, yielding up to 10% speedup in token generation on newer hardware.
NCCL-Free Tensor Parallelism on Dual Blackwell PCIe llama.cpp b9095 released!
llama.cpp build b9095 introduces NCCL-free tensor parallelism for dual Blackwell PCIe GPUs, enabling efficient multi-GPU inference without relying on NCCL.
Benchmarking vLLM vs SGLang vs llama.cpp on a mixed Blackwell/Ada cluster
This article benchmarks vLLM, SGLang, and llama.cpp on a mixed Blackwell/Ada GPU cluster for long context prefill, finding vLLM significantly outperforms others on heterogeneous setups while SGLang crashes with Ada cards due to FP4 support limitations.
Blackwell LLM Toolkit - NVFP4 Config +Wheels + Benchmarks for Blackwell GPUs via TensorRT-LLM - 270 tk/s Nemotron 3 Omni
A developer toolkit providing configurations, wheels, and benchmarks for running large language models with NVFP4 precision on Nvidia Blackwell GPUs using TensorRT-LLM.
@populartourist: llama.cpp release b9235 added some new toys for boosting inference. Benchmarked Qwen3.6 27B on an RTX 5090 with llama.c…
llama.cpp release b9235 introduces speculative n-gram tuning, achieving up to ~7x throughput improvement on Qwen3.6 27B on an RTX 5090, with the k4v96 configuration showing the best sustained performance in 10k and 70k token tests.