inference-tuning

#inference-tuning

llama.cpp - how to free up even more space on your GPU

Reddit r/LocalLLaMA ↗ · yesterday

A thread sharing practical tips for freeing up GPU memory in llama.cpp, such as offloading mmproj to CPU and adjusting KV cache types, while discussing parameters like --cache-type-k/v and --spec-draft-n-max.

0 favorites 0 likes

inference-tuning

llama.cpp - how to free up even more space on your GPU

Submit Feedback