Tag
The article announces support for DFlash and PFlash speculative decoding in llama.cpp for AMD Strix Halo iGPUs, demonstrating significant speedups in inference performance using ROCm.
User @binsquares reports that GPU acceleration on smolvm achieves nearly 90% of host performance when running llama.cpp via the Vulkan backend.
vLLM v0.20.1 is a minor version update for the popular open-source LLM inference and serving library, maintaining its focus on high-throughput and efficient memory management.
easyaligner is an open-source forced alignment library with GPU acceleration and flexible text normalization that works with all wav2vec2 models on Hugging Face Hub. It addresses practical workflows like handling partial transcripts, irrelevant speech segments, and long audio without chunking while preserving original text formatting.
Adobe announced a new Color Mode in Premiere Pro, optimized with NVIDIA GPU acceleration for 32-bit color grading, showcased at NAB 2026.
OpenAI open-sources mujoco-py, a high-performance Python library for robotic simulation using the MuJoCo engine, featuring ~40x speedup with headless GPU rendering and VR interaction support.