the accessibility tree gotchas that kept breaking my desktop agent

Reddit r/AI_Agents 05/20/26, 12:18 AM News

accessibility-tree desktop-agent debugging multi-monitor modal-sheets cross-app-handoff llm-agent

Summary

A developer shares four common accessibility tree pitfalls that break desktop agents: stale PIDs after app switches, modal sheets intercepting clicks, multi-monitor coordinate issues, and silent failures. Solutions include detecting frontmost app changes, explicit modal checks, and correct coordinate targeting.

my desktop agent stopped failing the moment i stopped trusting the accessibility tree as a single source of truth. The dumbest one was cross-app handoff. agent clicks a link in mail, safari becomes frontmost, the agent keeps asking for the original pid's tree and operating on a frozen snapshot. fix is detecting when the frontmost app changes between actions and traversing the new one before the next step. Easy to miss because the previous pid is still alive, just no longer relevant. second one was sheets and dialogs overriding window viewport scope. an element shows up in the tree because it technically exists in the hierarchy, but it sits underneath an active modal sheet, so clicks pass to whatever is actually on top. Needed an explicit "is this element inside the current modal" check before every click. Multi-monitor coordinates were the third. on a 3 screen setup the left external sits at x around -3840 and the right around 3456. a naive "click at x:200" lands on whichever screen contains (200, y), which is almost never the one you mean. llm clicking the wrong button is rarely the model. it is the tree state being stale or scoped wrong, and the failure mode is silent until you diff before and after screenshots. written with s4lai

Original Article

the accessibility tree gotchas that kept breaking my desktop agent

Similar Articles

Accessibility API and Set-of-Marks: making computer-use agents more reliable

I built agent-browser but for OS automation.

Things I learned the hard way building a web agent that clicks through real apps

Here is the main nugget that you need to understand computer-use vs browser-use agents

How working with a blind client revealed invisible accessibility gaps

Submit Feedback

Similar Articles

Accessibility API and Set-of-Marks: making computer-use agents more reliable

I built agent-browser but for OS automation.

Things I learned the hard way building a web agent that clicks through real apps

Here is the main nugget that you need to understand computer-use vs browser-use agents

How working with a blind client revealed invisible accessibility gaps