Training Open Models for Agentic Phone Use

Hugging Face Daily Papers 06/22/26, 12:00 AM Papers

Summary

PhoneBuddy combines real and mock app environments to train open models for agentic phone use, achieving 45.33% task success rate on real phones through mixed reinforcement learning, showing that mock-app training complements real-app training.

Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines a real-app environment with a mock-app environment, PhoneWorld, which reconstructs runnable mock apps from real GUI usage structure. PhoneBuddy first builds a shared supervised fine-tuning stage from trajectories collected in both environments, then compares real-app RL against mixed RL across both environments. Across a 150-task human evaluation on real phones spanning apps, mini-apps, and cross-app workflows, task success rate improves from 36.67\% after supervised fine-tuning to 40.67\% after real-app RL and 45.33\% after mixed RL. On AndroidWorld, the same progression rises from 60.3\% to 77.2\% to 83.2\%. These results show that mock-app training is not a replacement for real-app RL, but a complementary source of scalable, resettable, and automatically checked interaction. The gains are strongest on app and mini-app tasks, while long-horizontal cross-app workflows remain an important open challenge.

Original Article

View Cached Full Text

Cached at: 06/23/26, 05:40 AM

Paper page - Training Open Models for Agentic Phone Use

Source: https://huggingface.co/papers/2606.23049 Authors:

Abstract

PhoneBuddy combines real and mock app environments to improve training of open models for phone use, demonstrating enhanced task success rates through mixed reinforcement learning approaches.

Phones are becoming an important execution surface for general-purpose agents, but trainingopen modelsfor reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines areal-app environmentwith amock-app environment,PhoneWorld, which reconstructs runnable mock apps from real GUI usage structure. PhoneBuddy first builds a sharedsupervised fine-tuningstage from trajectories collected in both environments, then compares real-app RL against mixed RL across both environments. Across a 150-task human evaluation on real phones spanning apps, mini-apps, and cross-app workflows,task success rateimproves from 36.67\% aftersupervised fine-tuningto 40.67\% after real-app RL and 45.33\% after mixed RL. OnAndroidWorld, the same progression rises from 60.3\% to 77.2\% to 83.2\%. These results show that mock-app training is not a replacement for real-app RL, but a complementary source of scalable, resettable, and automatically checked interaction. The gains are strongest on app and mini-app tasks, while long-horizontal cross-app workflows remain an important open challenge.

View arXiv page View PDF Project page GitHub4 Add to collection

Get this paper in your agent:

hf papers read 2606\.23049

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.23049 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.23049 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.23049 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Training Open Models for Agentic Phone Use

Paper page - Training Open Models for Agentic Phone Use

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

PhoneWorld: Scaling Phone-Use Agent Environments

Giving AI a real phone feels more interesting than another browser agent

@ttunguz: I've been using state-of-the-art models to teach small models running on my computer how I work. The result : a persona…

AI agents should use real apps.

Submit Feedback

Similar Articles

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

PhoneWorld: Scaling Phone-Use Agent Environments

Giving AI a real phone feels more interesting than another browser agent

@ttunguz: I've been using state-of-the-art models to teach small models running on my computer how I work. The result : a persona…

AI agents should use real apps.