Just open-sourced FastVLA

Reddit r/LocalLLaMA Models

Summary

FastVLA, an open-source Vision-Language-Action model, now runs 5 Hz robotics on an L4 GPU.

got 5Hz robotics working on an L4. Thread with benchmarks and repo here: [https://x.com/bouajila\_h10330/status/2046909096205463562?s=20](https://x.com/bouajila_h10330/status/2046909096205463562?s=20)
Original Article

Similar Articles

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

Hugging Face Daily Papers

HiVLA introduces a hierarchical vision-language-action framework that decouples semantic planning from motor control using a diffusion transformer action expert for improved robotic manipulation. The system combines a VLM planner for task decomposition and visual grounding with a specialized DiT action expert using cascaded cross-attention, outperforming end-to-end baselines particularly in long-horizon tasks and fine-grained manipulation.

vllm-project/vllm v0.19.1

GitHub Releases Watchlist

vLLM v0.19.1 release - a fast and easy-to-use open-source library for LLM inference and serving with state-of-the-art throughput, supporting 200+ model architectures and diverse hardware including NVIDIA/AMD GPUs and CPUs.

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog

NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.