NCCL-Free Tensor Parallelism on Dual Blackwell PCIe llama.cpp b9095 released!

Reddit r/LocalLLaMA 05/10/26, 01:12 PM Tools

llama-cpp tensor-parallelism nvidia-blackwell multi-gpu nccl-free inference

Summary

llama.cpp build b9095 introduces NCCL-free tensor parallelism for dual Blackwell PCIe GPUs, enabling efficient multi-GPU inference without relying on NCCL.

b9095 finally makes -sm tensor work on dual consumer Blackwell PCIe GPUs without NCCL If youre on dual Blackwell gpus this look like it could be big. I'll have my own results for 2x5060ti asap

Original Article

Similar Articles

Blackwell LLM Toolkit - NVFP4 Config +Wheels + Benchmarks for Blackwell GPUs via TensorRT-LLM - 270 tk/s Nemotron 3 Omni

Reddit r/LocalLLaMA

A developer toolkit providing configurations, wheels, and benchmarks for running large language models with NVFP4 precision on Nvidia Blackwell GPUs using TensorRT-LLM.

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.

@zcbenz: We have achieved a milestone in MLX that all tests are passing in CUDA backend now.

X AI KOLs Following

MLX has reached a milestone where all tests pass on the CUDA backend, indicating improved compatibility with NVIDIA GPUs.

@binsquares: omg, GPU acceleration on smolvm works way better than I thought. can run llama.cpp inside the smol machine with close t…

X AI KOLs Following

User @binsquares reports that GPU acceleration on smolvm achieves nearly 90% of host performance when running llama.cpp via the Vulkan backend.

@pupposandro: https://x.com/pupposandro/status/2054241934164492328

X AI KOLs Timeline

The article announces support for DFlash and PFlash speculative decoding in llama.cpp for AMD Strix Halo iGPUs, demonstrating significant speedups in inference performance using ROCm.