Just open-sourced FastVLA
Summary
FastVLA, an open-source Vision-Language-Action model, now runs 5 Hz robotics on an L4 GPU.
Similar Articles
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
HiVLA introduces a hierarchical vision-language-action framework that decouples semantic planning from motor control using a diffusion transformer action expert for improved robotic manipulation. The system combines a VLM planner for task decomposition and visual grounding with a specialized DiT action expert using cascaded cross-attention, outperforming end-to-end baselines particularly in long-horizon tasks and fine-grained manipulation.
Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations
NXP and Hugging Face demonstrate techniques for deploying Vision-Language-Action (VLA) models on embedded robotic platforms, covering dataset recording best practices, VLA fine-tuning, and on-device optimizations including quantization and asynchronous inference scheduling for the i.MX 95 processor.
vllm-project/vllm v0.19.1
vLLM v0.19.1 release - a fast and easy-to-use open-source library for LLM inference and serving with state-of-the-art throughput, supporting 200+ model architectures and diverse hardware including NVIDIA/AMD GPUs and CPUs.
Gemma 4 VLA Demo on Jetson Orin Nano Super
NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.
@zhijianliu_: Reasoning VLAs can think. They just can't think fast. Until now. Introducing FlashDrive 716 ms → 159 ms on RTX PRO 6000…
FlashDrive reduces reasoning vision-language-action model inference latency from 716 ms to 159 ms on RTX PRO 6000—up to 5.7× faster—with zero accuracy loss, enabling real-time autonomous applications.