Tensor split mode: CUDA error on latest llama.cpp with Qwen-3.6-27b

Reddit r/LocalLLaMA 06/03/26, 12:38 PM News

llama-cpp cuda-error tensor-split-mode multi-gpu qwen-3-6-27b ubuntu docker

Summary

User reports a CUDA error when using tensor split mode with the latest llama.cpp and Qwen-3.6-27b model on dual RTX 3090s with Ubuntu Server 24.04 and Docker.

Hi guys, I am running into issues when loading the Unsloth UD-Q8\_K\_XL quant and wanted to check if anyone has ran into this. I updated my config to also use --split-mode tensor but wanted to check if I need to update drivers/CUDA to get it working as I see that the tensor split mode fixes are merged into llama.cpp. Running dual 3090's on Ubuntu Server 24.04. `NVIDIA-SMI 580.159.03 Driver Version: 580.159.03 CUDA Version: 13.0` This is my config running in Docker with the latest llama.cpp image. `-c 32768` `--flash-attn on` `--n-gpu-layers 999` `--split-mode tensor` `--parallel 1` `--tensor-split 1,1` `--jinja` `--temp 0.6` `--top-p 0.95` `--min-p 0.01` `--top-k 20` `--presence-penalty 0.0` `--spec-type draft-mtp` `--spec-draft-n-max 2` `--no-mmap` `-np 1` This is the error I get when starting up `0.01.790.389 I common_init_result: fitting params to device memory ...` `0.01.790.389 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on)` `0.01.790.459 W common_fit_params: failed to fit params to free device memory: llama_params_fit is not implemented for SPLIT_MODE_TENSOR, abort` `0.12.433.663 W llama_context: n_ctx_seq (32768) < n_ctx_train (262144) -- the full capacity of the model will not be utilized` `0.12.604.320 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)` `/app/ggml/src/ggml-cuda/ggml-cuda.cu:103: CUDA error` `0.13.277.104 E CUDA error: unhandled system error (run with NCCL_DEBUG=INFO for details)` `0.13.277.108 E current device: 0, in function ggml_backend_cuda_comm_allreduce_nccl at /app/ggml/src/ggml-cuda/ggml-cuda.cu:1217` `0.13.277.108 E ncclGroupEnd()` `...`

Original Article

Tensor split mode: CUDA error on latest llama.cpp with Qwen-3.6-27b

Similar Articles

Llama.cpp : Split Mode Tensor Fix Incoming?

Dual GPU llama.cpp speedup

@leopardracer: https://x.com/leopardracer/status/2055341758523883631

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Submit Feedback

Similar Articles

Llama.cpp : Split Mode Tensor Fix Incoming?

@leopardracer: https://x.com/leopardracer/status/2055341758523883631

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps