exllamav3

#exllamav3

I ported EXL3 to run well on Apple Silicon - PonyExl3

Reddit r/LocalLLaMA ↗ · 2026-06-15

Ported the EXL3 LLM codec to run on Apple Silicon via Metal, achieving high prefill and generation speeds on M5 Max (e.g., ~600 tok/s prefill, 17-80 tok/s gen on various models).

0 favorites 0 likes

#exllamav3

@0xSero: Here's everything you need to know about inference and hosting LLMs. Have you ever seen: - vllm - sglang - llama.cpp - …

X AI KOLs Timeline ↗ · 2026-04-20 Cached

An overview of popular open-source inference engines including vLLM, SGLang, llama.cpp, and ExLlamaV3 for hosting and running large language models.

0 favorites 0 likes

exllamav3

I ported EXL3 to run well on Apple Silicon - PonyExl3

@0xSero: Here's everything you need to know about inference and hosting LLMs. Have you ever seen: - vllm - sglang - llama.cpp - …

Submit Feedback