vla

#vla

@svlevine: If you want a robot to do something well, you need to know how to talk to it. If you don't, you can learn, with Semanti…

X AI KOLs Following ↗ · 21h ago Cached

This paper presents Semantic Action RL, which uses reinforcement learning over Vision-Language-Action (VLA) prompts to enable robots to learn new tasks quickly in the real world.

0 favorites 0 likes

#vla

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

arXiv cs.AI ↗ · 2026-06-24 Cached

Introduces Neuro-Symbolic Drive, a framework that uses rule-grounded reasoning traces from classical planners to fine-tune a driving VLA (Qwen3.5-4B), achieving significant reductions in average displacement error and miss rate compared to standard CoT reasoning.

0 favorites 0 likes

#vla

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper introduces PersonaDrive, a pipeline that conditions a vision-language-action (VLA) driving agent on retrieved demonstrations from a style-instructed human driving dataset, enabling style-diverse non-ego agents for closed-loop simulation and improving driving scores on Bench2Drive.

0 favorites 0 likes

#vla

@seclink: 5. Open-Source Acceleration of Robot World Models - NVIDIA Cosmos 3 + Isaac GR00T: Physical AI Foundation Models - AGIBOT Genie Sim 3.0: The First Fully Open-Source Robot Simulation Platform (Complete Open Source of Code, Data, and Assets) - VLA (Vision-…

X AI KOLs Following ↗ · 2026-06-08 Cached

Robot world models and simulation platforms are experiencing open-source acceleration: NVIDIA launched Cosmos 3 and Isaac GR00T physical AI foundation models, AGIBOT released Genie Sim 3.0, a fully open-source simulation platform, VLA models become mainstream for manipulation policies, collectively lowering the entry barrier for the robotics field.

0 favorites 0 likes

#vla

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

AffordanceVLA introduces a unified framework using structured affordance forecasting as an intermediate representation to improve perception-action mapping in robotic manipulation, leveraging vision-language models and a Mixture-of-Transformer architecture.

0 favorites 0 likes

#vla

Robot foundation models keep hiding behind fine-tuning numbers. Wall-OSS-0.5 is trying a different approach

Reddit r/artificial ↗ · 2026-05-31

X Square Robot releases Wall-OSS-0.5, a 4B open-source VLA robot foundation model evaluated on a 17-task real-robot zero-shot suite without task-specific fine-tuning, aiming to directly measure pretraining capability.

0 favorites 0 likes

#vla

Open-weights VLA hits 80%+ task progress on 4 of 17 real-robot tasks with zero fine-tuning. Demo reel attached

Reddit r/singularity ↗ · 2026-05-31

Release of Wall-OSS-0.5, an open-weights vision-language-action model that achieves over 80% task progress on 4 of 17 real-robot tasks with zero fine-tuning, including on a deformable rope task not seen during pretraining. The model preserves general vision-language ability while improving embodied grounding.

0 favorites 0 likes

#vla

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Hugging Face Daily Papers ↗ · 2026-05-13 Cached

FrameSkip is a data-layer frame selection method that improves Vision-Language-Action (VLA) policy training by prioritizing high-importance frames based on action variation and visual-coherence metrics, achieving a macro-average success rate of 76.15% across three benchmarks while using only 20% of unique frames.

0 favorites 0 likes

#vla

@dotey: https://x.com/dotey/status/2053351712149135385

X AI KOLs Timeline ↗ · 2026-05-10 Cached

NVIDIA's Jim Fan spoke at Sequoia AI Ascent 2026, declaring the VLA architecture obsolete and proposing World Action Models (WAM) as a new paradigm for robotics. He introduced key technologies including DreamZero, EgoScale, and the neural simulator Dream Dojo.

0 favorites 0 likes

#vla

MolmoAct2: Action Reasoning Models for Real-world Deployment

Papers with Code Trending ↗ · 2026-05-04 Cached

Allen AI releases MolmoAct2, an open-weight Vision-Language-Action model designed for real-world robotic deployment, featuring new datasets, an open action tokenizer, and adaptive reasoning to reduce latency.

0 favorites 0 likes

#vla

Gemma 4 VLA Demo on Jetson Orin Nano Super

Hugging Face Blog ↗ · 2026-04-22 Cached

NVIDIA and Hugging Face publish a hands-on demo showing Gemma 4 running as a vision-language-action model entirely on the Jetson Orin Nano Super, using local STT/TTS and webcam input.

0 favorites 0 likes

#vla

@zhijianliu_: Reasoning VLAs can think. They just can't think fast. Until now. Introducing FlashDrive 716 ms → 159 ms on RTX PRO 6000…

X AI KOLs Timeline ↗ · 2026-04-19 Cached

FlashDrive reduces reasoning vision-language-action model inference latency from 716 ms to 159 ms on RTX PRO 6000—up to 5.7× faster—with zero accuracy loss, enabling real-time autonomous applications.

0 favorites 0 likes

#vla

LeRobot v0.5.0: Scaling Every Dimension

Hugging Face Blog ↗ · 2026-03-09 Cached

LeRobot v0.5.0 is a major release featuring support for Unitree G1 humanoid robots, new policy architectures (Pi0-FAST VLAs, Real-Time Chunking), streaming video encoding for 3x faster training, and EnvHub for loading simulation environments from Hugging Face Hub.

0 favorites 0 likes

vla

Submit Feedback