MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Hugging Face Daily Papers 06/18/26, 12:00 AM Papers

mobile-gui-agent long-horizon context-management multimodal-llm dataset supervised-fine-tuning

Summary

MemGUI-Agent introduces proactive context management for long-horizon mobile GUI tasks, using Context-as-Action (ConAct) to maintain critical information. It includes the MemGUI-3K dataset and achieves state-of-the-art performance on MemGUI-Bench and MobileWorld benchmarks with an 8B model.

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.

Original Article

View Cached Full Text

Cached at: 06/24/26, 05:47 AM

Paper page - MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Source: https://huggingface.co/papers/2606.19926

Abstract

MemGUI-Agent addresses long-horizon mobile GUI task limitations through proactive context management using Context-as-Action (ConAct) to maintain critical information across extended sequences.

MLLM-based mobile GUI agentshave made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation toReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, anend-to-end long-horizon mobile GUI agentwith proactivecontext management. MemGUI-Agent is built onContext-as-Action (ConAct), which castscontext managementas first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains threestructured context fields:folded action history,folded UI state, andrecent step record, preserving critical UI facts while keeping context compact. To make proactivecontext managementlearnable across model scales, we constructMemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations forsupervised trainingandoffline analysis. Training an 8B model onMemGUI-3Kproduces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance onMemGUI-Benchand generalizes to the out-of-distributionMobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.

View arXiv page View PDF Project page GitHub1 Add to collection

Models citing this paper1

#### lgy0404/MemGUI-8B-SFT Image-Text-to-Text• 9B• Updated5 days ago • 50

Datasets citing this paper1

#### lgy0404/MemGUI-3K Viewer• Updated5 days ago • 2.96k • 702

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.19926 in a Space README.md to link it from this page.

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Paper page - MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Abstract

Models citing this paper1

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper1

Similar Articles

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments

MemGym: a Long-Horizon Memory Environment for LLM Agents

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

Submit Feedback

Similar Articles

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments

MemGym: a Long-Horizon Memory Environment for LLM Agents

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models