Training Open Models for Agentic Phone Use
Summary
PhoneBuddy combines real and mock app environments to train open models for agentic phone use, achieving 45.33% task success rate on real phones through mixed reinforcement learning, showing that mock-app training complements real-app training.
View Cached Full Text
Cached at: 06/23/26, 05:40 AM
Paper page - Training Open Models for Agentic Phone Use
Source: https://huggingface.co/papers/2606.23049 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
PhoneBuddy combines real and mock app environments to improve training of open models for phone use, demonstrating enhanced task success rates through mixed reinforcement learning approaches.
Phones are becoming an important execution surface for general-purpose agents, but trainingopen modelsfor reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines areal-app environmentwith amock-app environment,PhoneWorld, which reconstructs runnable mock apps from real GUI usage structure. PhoneBuddy first builds a sharedsupervised fine-tuningstage from trajectories collected in both environments, then compares real-app RL against mixed RL across both environments. Across a 150-task human evaluation on real phones spanning apps, mini-apps, and cross-app workflows,task success rateimproves from 36.67\% aftersupervised fine-tuningto 40.67\% after real-app RL and 45.33\% after mixed RL. OnAndroidWorld, the same progression rises from 60.3\% to 77.2\% to 83.2\%. These results show that mock-app training is not a replacement for real-app RL, but a complementary source of scalable, resettable, and automatically checked interaction. The gains are strongest on app and mini-app tasks, while long-horizontal cross-app workflows remain an important open challenge.
View arXiv pageView PDFProject pageGitHub4Add to collection
Get this paper in your agent:
hf papers read 2606\.23049
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.23049 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.23049 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.23049 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions
PhoneHarness is a mixed-action benchmark and execution framework that evaluates phone-use agents on verifiable mobile workflows, achieving a 75% pass rate and outperforming existing approaches by 12.9 percentage points through deterministic action routing and auditable execution traces.
PhoneWorld: Scaling Phone-Use Agent Environments
PhoneWorld is a pipeline that transforms real GUI trajectories into controllable mobile environments, enabling scalable creation of phone-use benchmarks. It covers 34 apps across 16 domains and shows that using its supervision improves performance on multiple evaluation benchmarks.
Giving AI a real phone feels more interesting than another browser agent
OpenGUI is highlighted as a novel AI agent platform that utilizes actual Android devices for task execution, offering a more realistic interface than traditional browser-based agents.
@ttunguz: I've been using state-of-the-art models to teach small models running on my computer how I work. The result : a persona…
Using large AI models to train smaller local models, the author built a personal agent that manages email, calendar, deals, blog, and research.
AI agents should use real apps.
OpenGUI is a tool that allows AI agents to directly operate real Android apps by reading the screen and interacting naturally, rather than relying on APIs or scripts.