llamacpp

Tag

Cards List
#llamacpp

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA · 14h ago

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.

0 favorites 0 likes
#llamacpp

@ivanfioravanti: llamacpp is gonna get MTP support soon!

X AI KOLs Following · yesterday Cached

llamacpp will soon support Multi-Token Prediction (MTP), enhancing inference efficiency.

0 favorites 0 likes
#llamacpp

Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post

Reddit r/LocalLLaMA · 2026-04-23

Reddit user demonstrates llamacpp speculative decoding boosting Qwen-3.6-27B token speed from 13.6 to 136.75 t/s, sharing exact commands and hardware setup.

0 favorites 0 likes
#llamacpp

Qwen3.6-27B Uncensored Aggressive is out with K_P quants!

Reddit r/LocalLLaMA · 2026-04-22

Community release of Qwen3.6-27B stripped of safety refusals and packaged in optimized K_P GGUF quants for llama.cpp and LM Studio.

0 favorites 0 likes
#llamacpp

What speed is everyone getting on Qwen3.6 27b?

Reddit r/LocalLLaMA · 2026-04-22

User benchmarks Qwen3.6-27B-Q8_0 at ~13 tokens/sec on 3 mixed GPUs with 128k context via llama.cpp, asking if performance is typical.

0 favorites 0 likes
#llamacpp

ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) by pl752 · Pull Request #21636 · ggml-org/llama.cpp

Reddit r/LocalLLaMA · 2026-04-21 Cached

Pull request adds optimized x86 and generic CPU q1_0 dot-product kernels to ggml-cpu, improving quantized LLM inference speed.

0 favorites 0 likes
← Back to home

Submit Feedback