heterogeneous-hardware

#heterogeneous-hardware

Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

arXiv cs.AI ↗ · 2026-05-20

This paper presents an empirical study on scheduling multiple LLMs on shared heterogeneous hardware, focusing on performance implications of CPU-GPU offloading and preemption. It finds that offloading causes non-linear decode degradation, especially for smaller models, and preemption overhead is dominated by model state reload, providing design guidance for future multi-model schedulers.

0 favorites 0 likes

#heterogeneous-hardware

Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Hugging Face Daily Papers ↗ · 2026-04-14 Cached

Forge-UGC is a four-phase universal graph compiler that speeds up transformer deployment on NPUs, cutting compilation time 6.9-9.2×, inference latency 18-36 % and energy 30-41 % versus OpenVINO/ONNX Runtime.

0 favorites 0 likes

heterogeneous-hardware

Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Submit Feedback