inference-tuning

Tag

Cards List
#inference-tuning

llama.cpp - how to free up even more space on your GPU

Reddit r/LocalLLaMA · yesterday

A thread sharing practical tips for freeing up GPU memory in llama.cpp, such as offloading mmproj to CPU and adjusting KV cache types, while discussing parameters like --cache-type-k/v and --spec-draft-n-max.

0 favorites 0 likes
← Back to home

Submit Feedback