Intel Arc Pro B70 llama.cpp benchmarks posted

Reddit r/LocalLLaMA 06/02/26, 06:28 AM News

intel arc-pro-b70 llama-cpp sycl benchmarks qwen hardware

Summary

Benchmark results for Intel Arc Pro B70 GPU running llama.cpp with SYCL on Qwen models show 63 tokens per second performance.

[https://www.reddit.com/r/LocalLLM/comments/1tuf6l1/intel\_arc\_pro\_b70\_llamacpp\_sycl\_63\_ts\_on\_qwen/](https://www.reddit.com/r/LocalLLM/comments/1tuf6l1/intel_arc_pro_b70_llamacpp_sycl_63_ts_on_qwen/)

Original Article

Similar Articles

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Reddit r/LocalLLaMA

Community benchmark shows Intel Arc Pro B70 averages ~71% slower prompt processing and ~54% slower token generation than RTX 3090 under llama.cpp, with SYCL backend sometimes beating Vulkan on the same card.

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Reddit r/LocalLLaMA

This article describes how to use the SYCL backend with llama.cpp to achieve over 60 tokens per second on the Qwen 3.6-35B-A3B model using an Intel Arc Pro B70 GPU, with the entire model and KV cache in VRAM.

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

X AI KOLs Following

Qwen3.6-27B-FP8 model is now running on Intel Arc Pro B70 GPUs at ~50 tok/s with a vLLM bug fix, marking a significant milestone for Intel GPU local AI inference.

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

Reddit r/artificial

Intel’s LLM-Scaler vllm-0.14.0-b8.2 adds official support for the Arc Pro B70 GPU, enabling Docker-based large-model inference on Battlemage hardware.

PSA: Test your "threads" argument in llama.cpp (+80% performance in my case)

Reddit r/LocalLLaMA

A user benchmarks thread count for hybrid CPU-GPU inference with Gemma 4 in llama.cpp, discovering a 80% performance uplift by using 16 threads instead of 6 on a hybrid core CPU, and shares the optimal command configuration.

Similar Articles

Nvidia RTX 3090 vs Intel Arc Pro B70 llama.cpp Benchmarks

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

PSA: Test your "threads" argument in llama.cpp (+80% performance in my case)

Submit Feedback