PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Hugging Face Daily Papers 06/12/26, 12:00 AM Papers

phone-agents mobile-workflows benchmark gui-control cli tool-actions execution-framework

Summary

PhoneHarness is a mixed-action benchmark and execution framework that evaluates phone-use agents on verifiable mobile workflows, achieving a 75% pass rate and outperforming existing approaches by 12.9 percentage points through deterministic action routing and auditable execution traces.

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, a mixed-action benchmark and execution harness for studying phone-use agents on verifiable mobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combining deterministic action routing with bounded GUI delegation and auditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends on action-surface routing and verifiable execution, not only visual GUI control.

Original Article

View Cached Full Text

Cached at: 06/16/26, 11:33 AM

Paper page - PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Source: https://huggingface.co/papers/2606.14832 Authors:

Abstract

PhoneHarness presents a mixed-action benchmark and execution framework for evaluating phone-use agents on verifiable mobile workflows, demonstrating superior performance over existing approaches through deterministic action routing and auditable execution traces.

Phone agents are increasingly expected to complete realmobile workflowsrather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily asGUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, amixed-action benchmarkandexecution harnessfor studyingphone-use agentson verifiablemobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combiningdeterministic action routingwithbounded GUI delegationandauditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends onaction-surface routingandverifiable execution, not only visual GUI control.

View arXiv page View PDF Project page GitHub20 Add to collection

Get this paper in your agent:

hf papers read 2606\.14832

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.14832 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.14832 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.14832 in a Space README.md to link it from this page.

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Paper page - PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

PhoneWorld: Scaling Phone-Use Agent Environments

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

Harness design for long-running application development

Auditing Agent Harness Safety

Submit Feedback

Similar Articles

PhoneWorld: Scaling Phone-Use Agent Environments

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

Harness design for long-running application development