sycl

#sycl

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-06-05 Cached

A pull request for llama.cpp ports multi-column MMVQ from CUDA to SYCL, achieving approximately 45% speculative decoding speedup on Intel Arc GPUs.

0 favorites 0 likes

#sycl

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Reddit r/LocalLLaMA ↗ · 2026-06-02 Cached

This article describes how to use the SYCL backend with llama.cpp to achieve over 60 tokens per second on the Qwen 3.6-35B-A3B model using an Intel Arc Pro B70 GPU, with the entire model and KV cache in VRAM.

0 favorites 0 likes

#sycl

Intel Arc Pro B70 llama.cpp benchmarks posted

Reddit r/LocalLLaMA ↗ · 2026-06-02

Benchmark results for Intel Arc Pro B70 GPU running llama.cpp with SYCL on Qwen models show 63 tokens per second performance.

0 favorites 0 likes

#sycl

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Reddit r/LocalLLaMA ↗ · 2026-04-23

Community benchmark shows Intel Arc Pro B70 averages ~71% slower prompt processing and ~54% slower token generation than RTX 3090 under llama.cpp, with SYCL backend sometimes beating Vulkan on the same card.

0 favorites 0 likes

sycl

sycl : port multi-column MMVQ from CUDA backend (~45% speculative decoding speedup on Intel Arc) by masonmilby · Pull Request #21845 · ggml-org/llama.cpp

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Intel Arc Pro B70 llama.cpp benchmarks posted

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Submit Feedback