RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Hugging Face Daily Papers 05/11/26, 12:00 AM Papers

Summary

RoboMemArena introduces a large-scale benchmark for evaluating robotic memory across 26 complex tasks with real-world validation, alongside PrediMem, a dual-system vision-language-action model that improves memory management through predictive coding.

Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical evaluation. We further design PrediMem, a dual-system VLA in which a high-level VLM planner manages a memory bank with recent and keyframe buffers and uses a predictive coding head to improve sensitivity to task dynamics. Extensive experiments on RoboMemArena show that PrediMem outperforms all baselines and provides insights into memory management, model architecture, and scaling laws for complex memory systems.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/12/26, 10:53 AM

Paper page - RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Source: https://huggingface.co/papers/2605.10921 Authors:

Abstract

RoboMemArena presents a large-scale robotic memory benchmark with diverse tasks and real-world evaluation, while PrediMem demonstrates improved memory management through a dual-system vision-language architecture with predictive coding.

Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation withoutreal-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages avision-language model(VLM) to design and compose subtasks, generates full trajectories throughatomic functions, and providesmemory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical evaluation. We further design PrediMem, adual-system VLAin which a high-level VLM planner manages amemory bankwith recent andkeyframe buffersand uses apredictive coding headto improve sensitivity to task dynamics. Extensive experiments on RoboMemArena show that PrediMem outperforms all baselines and provides insights into memory management, model architecture, and scaling laws for complex memory systems.

View arXiv page View PDF Project page GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2605\.10921

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.10921 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.10921 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.10921 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Paper page - RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

MEME: Multi-entity & Evolving Memory Evaluation

I built a benchmark for AI “memory” in coding agents. looking for others to beat it.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

Benchmarking agent memory retrieval on LongMemEval‑S — 98% Recall@5, 100% recall by R@23, local embeddings only (all-MiniLM-L6-v2), no LLM, no API key

Submit Feedback

Similar Articles

MEME: Multi-entity & Evolving Memory Evaluation

I built a benchmark for AI “memory” in coding agents. looking for others to beat it.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

Benchmarking agent memory retrieval on LongMemEval‑S — 98% Recall@5, 100% recall by R@23, local embeddings only (all-MiniLM-L6-v2), no LLM, no API key