arc-gpu

#arc-gpu

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Reddit r/LocalLLaMA ↗ · 2026-06-02 Cached

This article describes how to use the SYCL backend with llama.cpp to achieve over 60 tokens per second on the Qwen 3.6-35B-A3B model using an Intel Arc Pro B70 GPU, with the entire model and KV cache in VRAM.

0 favorites 0 likes

#arc-gpu

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

Reddit r/artificial ↗ · 2026-04-22 Cached

Intel’s LLM-Scaler vllm-0.14.0-b8.2 adds official support for the Arc Pro B70 GPU, enabling Docker-based large-model inference on Battlemage hardware.

0 favorites 0 likes

#arc-gpu

About to build a 6× Arc B70 LLM rig, want to talk to someone experienced first

Reddit r/LocalLLaMA ↗ · 2026-04-20

A user seeks experienced guidance on building a 6× Intel Arc B70 LLM inference rig, particularly for Llama models and vLLM deployment, offering compensation for consultation.

0 favorites 0 likes

arc-gpu

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

About to build a 6× Arc B70 LLM rig, want to talk to someone experienced first

Submit Feedback