intel-arc

#intel-arc

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-06-05 Cached

A pull request for llama.cpp ports multi-column MMVQ from CUDA to SYCL, achieving approximately 45% speculative decoding speedup on Intel Arc GPUs.

0 favorites 0 likes

#intel-arc

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

X AI KOLs Following ↗ · 2026-05-15 Cached

Qwen3.6-27B-FP8 model is now running on Intel Arc Pro B70 GPUs at ~50 tok/s with a vLLM bug fix, marking a significant milestone for Intel GPU local AI inference.

0 favorites 0 likes

#intel-arc

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Reddit r/LocalLLaMA ↗ · 2026-04-23

Community benchmark shows Intel Arc Pro B70 averages ~71% slower prompt processing and ~54% slower token generation than RTX 3090 under llama.cpp, with SYCL backend sometimes beating Vulkan on the same card.

0 favorites 0 likes

intel-arc

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Submit Feedback