@steeve: aaaaaand we're faster (i know i know)

X AI KOLs Following News

Summary

Steeve Morin reports that after 5 days of work, his implementation is now within 10% of llama.cpp's speed, achieving 64 tok/s vs 70 tok/s, with more work to do.

aaaaaand we're faster (i know i know) https://t.co/Yt4QUg6esp
Original Article
View Cached Full Text

Cached at: 06/08/26, 05:24 PM

aaaaaand we’re faster (i know i know) https://t.co/Yt4QUg6esp

Steeve Morin (@steeve): After 5 days of work, we are now within 10% of llama.cpp (64 tok/s vs 70 tok/s) More work to do but momentum is great.

Similar Articles

Dual GPU llama.cpp speedup

Reddit r/LocalLLaMA

A fork of llama.cpp fixes the --split-mode tensor issue with quantized KV caches, achieving up to 40% speed improvement on dual GPU setups without quality loss.