Just open-sourced FastVLA

Reddit r/LocalLLaMA 04/22/26, 11:56 AM Models

Summary

FastVLA, an open-source Vision-Language-Action model, now runs 5 Hz robotics on an L4 GPU.

got 5Hz robotics working on an L4. Thread with benchmarks and repo here: [https://x.com/bouajila\_h10330/status/2046909096205463562?s=20](https://x.com/bouajila_h10330/status/2046909096205463562?s=20)

Original Article

Similar Articles

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

Hugging Face Daily Papers

HiVLA introduces a hierarchical vision-language-action framework that decouples semantic planning from motor control using a diffusion transformer action expert for improved robotic manipulation. The system combines a VLM planner for task decomposition and visual grounding with a specialized DiT action expert using cascaded cross-attention, outperforming end-to-end baselines particularly in long-horizon tasks and fine-grained manipulation.

Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations

Hugging Face Blog

NXP and Hugging Face demonstrate techniques for deploying Vision-Language-Action (VLA) models on embedded robotic platforms, covering dataset recording best practices, VLA fine-tuning, and on-device optimizations including quantization and asynchronous inference scheduling for the i.MX 95 processor.

vllm-project/vllm v0.19.1

GitHub Releases Watchlist

vLLM v0.19.1 release - a fast and easy-to-use open-source library for LLM inference and serving with state-of-the-art throughput, supporting 200+ model architectures and diverse hardware including NVIDIA/AMD GPUs and CPUs.

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog

NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.

@zhijianliu_: Reasoning VLAs can think. They just can't think fast. Until now. Introducing FlashDrive 716 ms → 159 ms on RTX PRO 6000…

X AI KOLs Timeline

FlashDrive reduces reasoning vision-language-action model inference latency from 716 ms to 159 ms on RTX PRO 6000—up to 5.7× faster—with zero accuracy loss, enabling real-time autonomous applications.