Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support
Summary
Intel’s LLM-Scaler vllm-0.14.0-b8.2 adds official support for the Arc Pro B70 GPU, enabling Docker-based large-model inference on Battlemage hardware.
View Cached Full Text
Cached at: 04/22/26, 03:13 PM
Intel LLM-Scaler vllm-0.14.0-b8.2 Released With Official Arc Pro B70 Support
Source: https://www.phoronix.com/news/Intel-LLM-Scaler-vllm-0.14-b8.2

As part of Intel’sLLM-Scalerinitiative for AI inferencing on Intel Arc hardware, out today is their vllm-0.14.0-b8.2 update that includes officially supporting the Arc Pro B70 graphics card.
Intel LLM-Scaler provides the Docker-ized approach for deploying large language models on Intel Arc hardware with a particular focus on latest-generation Battlemage graphics hardware, including multi-GPU configurations as part of theProject Battlematrixinitiative that began last year.
With today’s update for their LLM-Scaler stack with vLLM, they have updated the platform image to intel/llm-scaler-platform:26.18.8.2. The only other listed change is now officially supporting the Intel Arc Pro B70 GPU, the BMG-G31 graphics card that recently debuted with 32GB of vRAM and a sub-$1000 price point.
The IntelArc Pro B70continues running well in tests at Phoronix and I will have out some more benchmarks soon.
The new release is tagged onGitHuband also available via Docker Hub. Though since the release announcement notification, they’ve dropped mentioning the highlights and the Arc Pro B70 support, presumably some release process snafu.
Similar Articles
Intel Arc Pro B70 llama.cpp benchmarks posted
Benchmark results for Intel Arc Pro B70 GPU running llama.cpp with SYCL on Qwen models show 63 tokens per second performance.
@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…
Qwen3.6-27B-FP8 model is now running on Intel Arc Pro B70 GPUs at ~50 tok/s with a vLLM bug fix, marking a significant milestone for Intel GPU local AI inference.
@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm
AirLLM is an open-source tool that optimizes inference memory usage, enabling 70B LLMs to run on a single 4GB GPU without quantization, and supports 405B models on 8GB VRAM.
vllm-project/vllm v0.19.1
vLLM v0.19.1 release - a fast and easy-to-use open-source library for LLM inference and serving with state-of-the-art throughput, supporting 200+ model architectures and diverse hardware including NVIDIA/AMD GPUs and CPUs.
@LottoLabs: A very cool model for the GPU poor bros Trained on an ungodly amount of tokens for a 8b a1b model Gonna be super fast e…
LottoLabs announces LiquidAI's LFM2.5-8B-A1B-GGUF model, an 8B parameter model trained on a massive token count and optimized for fast inference on limited GPU hardware, with support for llama.cpp, Ollama, vLLM, and more.