Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

Reddit r/artificial 04/22/26, 02:00 PM Tools

Summary

Intel’s LLM-Scaler vllm-0.14.0-b8.2 adds official support for the Arc Pro B70 GPU, enabling Docker-based large-model inference on Battlemage hardware.

No content available

Original Article

View Cached Full Text

Cached at: 04/22/26, 03:13 PM

Intel LLM-Scaler vllm-0.14.0-b8.2 Released With Official Arc Pro B70 Support

Source: https://www.phoronix.com/news/Intel-LLM-Scaler-vllm-0.14-b8.2 INTEL

As part of Intel’sLLM-Scalerinitiative for AI inferencing on Intel Arc hardware, out today is their vllm-0.14.0-b8.2 update that includes officially supporting the Arc Pro B70 graphics card.

Intel LLM-Scaler provides the Docker-ized approach for deploying large language models on Intel Arc hardware with a particular focus on latest-generation Battlemage graphics hardware, including multi-GPU configurations as part of theProject Battlematrixinitiative that began last year.

With today’s update for their LLM-Scaler stack with vLLM, they have updated the platform image to intel/llm-scaler-platform:26.18.8.2. The only other listed change is now officially supporting the Intel Arc Pro B70 GPU, the BMG-G31 graphics card that recently debuted with 32GB of vRAM and a sub-$1000 price point.

Intel Arc Pro B70

The IntelArc Pro B70continues running well in tests at Phoronix and I will have out some more benchmarks soon.

The new release is tagged onGitHuband also available via Docker Hub. Though since the release announcement notification, they’ve dropped mentioning the highlights and the Arc Pro B70 support, presumably some release process snafu.

Intel LLM-Scaler vllm-0.14.0-b8.2 released with official Arc Pro B70 support

Intel LLM-Scaler vllm-0.14.0-b8.2 Released With Official Arc Pro B70 Support

Similar Articles

Intel Arc Pro B70 llama.cpp benchmarks posted

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm

vllm-project/vllm v0.19.1

@LottoLabs: A very cool model for the GPU poor bros Trained on an ungodly amount of tokens for a 8b a1b model Gonna be super fast e…

Submit Feedback

Similar Articles

Intel Arc Pro B70 llama.cpp benchmarks posted
Benchmark results for Intel Arc Pro B70 GPU running llama.cpp with SYCL on Qwen models show 63 tokens per second performance.

@TeksEdge: Solved! Qwen3.6-27B-FP8 is now running on Intel Arc Pro B70! LocalMaxxing shows a working 4× Arc Pro B70 32GB run at ~5…

@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm

@LottoLabs: A very cool model for the GPU poor bros Trained on an ungodly amount of tokens for a 8b a1b model Gonna be super fast e…