@steeve: aaaaaand we're faster (i know i know)

X AI KOLs Following 06/08/26, 11:32 AM News

Summary

Steeve Morin reports that after 5 days of work, his implementation is now within 10% of llama.cpp's speed, achieving 64 tok/s vs 70 tok/s, with more work to do.

aaaaaand we're faster (i know i know) https://t.co/Yt4QUg6esp

Original Article

View Cached Full Text

Cached at: 06/08/26, 05:24 PM

aaaaaand we’re faster (i know i know) https://t.co/Yt4QUg6esp

Steeve Morin (@steeve): After 5 days of work, we are now within 10% of llama.cpp (64 tok/s vs 70 tok/s) More work to do but momentum is great.

@steeve: aaaaaand we're faster (i know i know)

Similar Articles

@steeve: Progress: 26 tok/s (llama 3.1 3b) .@tenstorrent claims 33 tok/s so we’re not far off

@leopardracer: THIS AMERICAN DEVELOPER SPENT WEEKS DEBUGGING TIMEOUT ERRORS IN OLLAMA. THEN HE LOOKED UNDER THE HOOD LM Studio is just…

@binsquares: omg, GPU acceleration on smolvm works way better than I thought. can run llama.cpp inside the smol machine with close t…

Dual GPU llama.cpp speedup

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …

Submit Feedback

Similar Articles

@steeve: Progress: 26 tok/s (llama 3.1 3b) .@tenstorrent claims 33 tok/s so we’re not far off
Steeve Morin reports running Llama 3.1 3B on Tenstorrent hardware via ZML, achieving 26 tok/s, close to Tenstorrent's claimed 33 tok/s.

@leopardracer: THIS AMERICAN DEVELOPER SPENT WEEKS DEBUGGING TIMEOUT ERRORS IN OLLAMA. THEN HE LOOKED UNDER THE HOOD LM Studio is just…

@binsquares: omg, GPU acceleration on smolvm works way better than I thought. can run llama.cpp inside the smol machine with close t…

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …