action-prediction

#action-prediction

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

Hugging Face Daily Papers ↗ · 2026-06-01 Cached

RoboSemanticBench is a benchmark that diagnoses semantic grounding in action prediction for vision-language-action models, revealing that while robots can grasp objects, they fail to select semantically correct targets based on instruction semantics.

0 favorites 0 likes

#action-prediction

Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark

arXiv cs.AI ↗ · 2026-05-29 Cached

This paper introduces the PiSAR benchmark for screen-conditioned action prediction and compares supervised fine-tuned models against frontier zero-shot baselines. Key findings show a fine-tuned Qwen3-VL-8B achieves 0.783 semantic similarity, significantly outperforming Claude Opus 4.7 and GPT-5.5 (0.459 and 0.482), but the same fine-tuning recipe on a larger reasoning-tuned Gemma model yields only 0.441, indicating a model-recipe mismatch.

0 favorites 0 likes

#action-prediction

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

MementoGUI introduces a plug-in agentic memory framework for GUI agents that uses learned controllers for selective memory management and retrieval, improving performance on long-horizon tasks with compressed visual and textual representations.

0 favorites 0 likes

action-prediction

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Submit Feedback