ik-llama

#ik-llama

Comparing dual-GPU inference speed between llama.cpp row/tensor split and ik_llama graph split

Reddit r/LocalLLaMA ↗ · 3d ago

A user benchmarks dual-GPU inference speed on two RTX 3080 20GB using llama.cpp (row/tensor split) and ik_llama (graph split) with a Qwen3.6-27B GGUF model, comparing token generation and prompt processing speeds.

0 favorites 0 likes

ik-llama

Comparing dual-GPU inference speed between llama.cpp row/tensor split and ik_llama graph split

Submit Feedback