Tag
MIRAGE is a framework for mobile GUI agents that replaces verbose chain-of-thought reasoning with compact continuous latent representations, incorporating a generative world model perspective to predict future screen states before acting. On AndroidWorld and AndroidControl benchmarks, it achieves competitive or superior performance while reducing generated tokens by over 75%.
This paper proposes a Pre-Reasoning Perception Framework (PRPF) for proactive mobile agents, decoupling intervention timing from assistance generation to improve efficiency and reduce false triggers.
The author observes that the hardest part of phone-use AI agents is tracking state changes, as mobile interfaces have more dynamic and interruptive UI changes compared to desktop, and asks for others' experience.