Open-weights VLA hits 80%+ task progress on 4 of 17 real-robot tasks with zero fine-tuning. Demo reel attached

Reddit r/singularity Models

Summary

Release of Wall-OSS-0.5, an open-weights vision-language-action model that achieves over 80% task progress on 4 of 17 real-robot tasks with zero fine-tuning, including on a deformable rope task not seen during pretraining. The model preserves general vision-language ability while improving embodied grounding.

Sharing this because it is an embodied AI release trying to make the pretrained checkpoint itself measurable, instead of only showing results after task-specific tuning. The video is a reel from Wall-OSS-0.5, a vision language action model released with open-source resources. Every clip in the reel has the same "Autonomous w/o Fine-Tuning" watermark in the corner. The robot is doing things like opening a pot lid and dropping fruit inside, covering blocks with a cloth, sorting items by color, putting drinks in specific containers in a specified order, shredding paper, putting a cup to the right of a calculator. According to the release, these clips are from the pretrained checkpoint rather than task-specific fine tuning. What is interesting compared with the usual humanoid demo cycle is the evaluation framing. They report 4 of 17 real robot tasks above 80 percent task progress at zero shot, including a deformable rope tightening task that was not in the pretraining set. They also show pretraining task progress rising across checkpoints, with held-out tasks tracking seen tasks. That is the kind of curve people keep asking for in embodied AI, even if it is still early. The other part I found notable is that the model seems to preserve general image/language ability while improving embodied grounding, at least by their evaluation. That matters because a lot of robot policies feel like they gain control ability by becoming narrower. Code: [https://github.com/X-Square-Robot/wall-x](https://github.com/X-Square-Robot/wall-x). Paper: [https://x2robot.com/api/files/file/wall\_oss\_05.pdf](https://x2robot.com/api/files/file/wall_oss_05.pdf). Hugging Face org: [https://huggingface.co/x-square-robot](https://huggingface.co/x-square-robot). The caveat is that the harder tasks are still not solved. Towel folding, charger insertion and table setting are still very low in zero shot, so pretraining alone is not magic. The real test is whether outside groups can run the checkpoint on their own arms and see similar strengths and failures. Reel is attached. Original demo is on their project page.
Original Article

Similar Articles

Just open-sourced FastVLA

Reddit r/LocalLLaMA

FastVLA, an open-source Vision-Language-Action model, now runs 5 Hz robotics on an L4 GPU.