@QingQ77: 让 AI 自动操控真实 Android 手机,执行社交、调研、内容运营等长时段移动端任务 https://github.com/Core-Mate/OpenGUI… OpenGUI 是个 AI 手机操控系统,AI 直接在你的 Androi…
摘要
OpenGUI 是一个开源 AI 手机操控系统,让 AI 自动操控真实 Android 设备执行社交、调研等长时段移动端任务,支持通过飞书、Telegram、Discord 或 REST API 远程派发任务,底层架构分为 Plan Supervisor 和 Executor Graph 两层,支持 Claude、Qwen、豆包等多种模型。
查看缓存全文
缓存时间: 2026/05/09 07:45
让 AI 自动操控真实 Android 手机,执行社交、调研、内容运营等长时段移动端任务 https://github.com/Core-Mate/OpenGUI… OpenGUI 是个 AI 手机操控系统,AI 直接在你的 Android 手机上操作 X、Reddit、微信、小红书这些 App。架构分两层:上面 Plan Supervisor 管任务状态和续跑,下面 Executor Graph 在手机上循环截图→看→动→反馈,最后 Summarizer 整理结果。 可以通过飞书、Telegram 或 REST API 远程派活,手机平时待命接到任务就跑。
Core-Mate/OpenGUI
Source: https://github.com/Core-Mate/OpenGUI
Language: English | 简体中文 | 日本語
Recent Updates
[2026.5.9]Added a Discord IM channel for remote Android task dispatch, including prefix commands, slash commands, allowlists, and guild-scoped command registration.[2026.5.7]Hardened local startup to avoid common PostgreSQL and Redis port conflicts during Docker-based backend setup.[2026.5.1]Improved backend onboarding with.env.example, startup checks, and graph-agent VLM environment configuration.
What You Can Do with OpenGUI
OpenGUI lets AI operate real Android phones.
You can use the same repository in four practical ways:
- Operate mainstream Android apps: let AI handle mobile tasks inside X, Reddit, Hacker News, Telegram, WeChat, Weibo, Xiaohongshu, and other Android apps on a real phone.
- Run shipped workflows: the repository already includes a runnable backend, Android client, standby dispatch path, and a set of built-in task capabilities.
- Let Claude or Codex bootstrap it for you: point the model at
skills/open-gui-bootstrap/SKILL.md, describe the goal in plain language, and let it handle setup, build, install, and local debugging. - Operate phones as remote workers: dispatch tasks through Feishu, Telegram, Discord, or REST API, keep devices on standby, and get structured results back from the backend.
Highlights
- Built for long-running tasks: OpenGUI is shaped for mobile workflows that may run for hours, with progress, review, and recovery kept inside the system.
- The task can keep moving:
Plan Supervisormaintains task state and continuation,Executor Graphruns screenshot, vision, action, and call-user loops on top of live device state, andSummarizercloses the run with a structured result. - Phones can stay on standby: the standby dispatch path lets devices receive remote work through Feishu, Telegram, Discord, or REST entry points.
- Models can be assigned by role: model routing separates planning from VLM execution so teams can choose providers by job.
- The system is organized around real mobile workflows: the graph, device execution path, and model split already exist in the source tree.
Why OpenGUI Is Different
OpenGUI is built as a mobile operator system with explicit orchestration layers.
The source code currently exposes these pieces:
server/apps/backend/src/modules/graph-agent/graph/mobile-agent.graph.tsfor the main graphserver/apps/backend/src/modules/graph-agent/graph/executor.graph.tsfor the device-side execution loopserver/apps/backend/src/common/ws/standby.gateway.tsfor standby device dispatchclient/core_network/.../StandbySocketManager.ktfor persistent device standby connectionsclient/core_accessibility/.../GestureService.ktfor Android-side action execution
| Dimension | Typical phone-agent demo | OpenGUI |
|---|---|---|
| Execution model | Short interactive loop | Main graph plus executor subgraph |
| Task state | Usually local and session-bound | Task state managed in the backend graph |
| Device path | Often laptop-driven control | Android client with standby and execution sockets |
| Model usage | One model does most of the work | Planning and VLM paths can be split across providers |
| Remote operation | Optional add-on | Feishu, Telegram, Discord, REST API, and standby dispatch are built into the backend |
Typical Use Cases
- Open X and collect recent posts for a topic
- Read and summarize Reddit or Hacker News threads on a live phone
- Trigger Android tasks remotely from Feishu, Telegram, Discord, or REST API
- Execute repetitive mobile workflows on Android devices
- Run long mobile workflows that need state, review, and recovery over many hours
How to Use OpenGUI
1. With Claude or Codex
Start with skills/open-gui-bootstrap/SKILL.md.
The intended flow is simple:
- point Claude or Codex at the skill
- describe the task in plain language
- let the model handle backend bootstrap, APK build, install, and local debugging
It should only stop for:
- connecting a phone or starting an emulator
- approving USB debugging
- enabling AccessibilityService
- granting overlay or battery permissions
- providing API keys or bot credentials
Recommended profiles:
High-performance profile
Use the latest Claude Opus model family across planning, supervision, review, and vision when you want the strongest overall quality.
This is the easiest way to get the best execution quality, and it is the most expensive path.
Cost-saving mixed profile
Use Qwen 3.6 Plus for text-side roles such as Planner and Supervisor, and use Doubao Pro for the VLM side.
This usually preserves the overall system shape while lowering model cost by roughly 10x to 15x compared with an all-Opus setup, depending on task length, screenshot volume, and token mix.
Recommended prompts:
Run it
Read ./skills/open-gui-bootstrap/SKILL.md and help me run OpenGUI. Only ask me for phone-side actions.
Use Claude Opus everywhere
Read ./skills/open-gui-bootstrap/SKILL.md and bootstrap OpenGUI with the latest Claude Opus model family for planning, supervision, review, and vision.
Use Qwen + Doubao to save cost
Read ./skills/open-gui-bootstrap/SKILL.md and set up OpenGUI with Qwen 3.6 Plus for Planner and Supervisor, and Doubao Pro for VLM execution.
Use my own APIs
Read ./skills/open-gui-bootstrap/SKILL.md and use my existing model APIs to get OpenGUI working.
2. Manual setup
Use the repository scripts directly:
cd server
./start.sh
cd client
./start.sh
Reference docs:
- docs/get-started.md
- server/start.sh
- client/start.sh
- server/apps/backend/README.md
- DISCORD.md
- client/README.md
3. Optional Discord remote control
Discord can be enabled as an optional IM channel. A Discord bot receives commands
such as !opengui devices or !opengui do ..., then the backend dispatches the
task to a standby Android phone and posts progress back to the same channel.
This is not required for local use. If DISCORD_BOT_TOKEN is empty, the backend
starts normally and skips Discord.
Full setup guide: DISCORD.md.
The System
flowchart LR
U["User or IM command"] --> BS["Bootstrap Skill / API / IM entry"]
BS --> SP["Plan Supervisor"]
SP --> EX["Executor Graph"]
EX --> AC["Android Client"]
AC --> GX["AccessibilityService + screenshots + actions"]
EX --> RV["Execution review and retry"]
RV --> SP
SP --> SM["Summarizer"]
SM --> SR["Structured Results"]
RD["Feishu / Telegram / Discord / REST API"] --> ST["Standby Gateway"]
ST --> AC
SP --> MR["Model Routing"]
MR --> MA["Claude / GPT / Gemini / Kimi / MiniMax / compatible"]
EX --> MR
Core Runtime Pieces
- Backend graph:
server/apps/backend/src/modules/graph-agent/graph/ - Task APIs:
server/apps/backend/src/modules/task/task.controller.ts - Standby dispatch:
server/apps/backend/src/common/ws/standby.gateway.ts - IM channel dispatch:
server/apps/backend/src/modules/im-channel/ - Android standby connection:
client/core_network/src/main/java/com/coremate/opengui/network/websocket/StandbySocketManager.kt - Android execution path:
client/core_accessibility/src/main/java/com/coremate/opengui/accessibility/GestureService.kt
Documentation
- skills/open-gui-bootstrap/SKILL.md
- docs/get-started.md
- server/apps/backend/README.md
- DISCORD.md
- client/README.md
- CONTRIBUTING.md
- SECURITY.md
- CLAUDE.md
Community / Support
The most useful project feedback is:
- open issues for bugs and feature requests
- share real use cases and deployment feedback
- contribute docs, integrations, and fixes
License
OpenGUI is source-available under the Business Source License 1.1 (BUSL-1.1).
You may copy, modify, distribute, and use the source for non-production purposes. Production use, commercial use, hosted services, and integration into commercial products require a separate commercial license from Core-Mate.
For this version:
- Change Date: 2030-04-29
- Change License: Apache License, Version 2.0
This is public source, but it is not OSI-approved open source until the Change Date.
See LICENSE.
相似文章
@qloog: 别再说 AI 只是提效工具了。 这套OpenAI 官方都认可的 Codex 教程,直接让你一个人干完一个团队的活: 做 iOS App、写代码、出投资者 Deck,全流程打通。 核心就两件事: 自定义 Skill(能力复用) 自动化(效率…
OpenAI-endorsed Codex tutorial enables solo developers to build iOS apps, write code, and generate investor decks through reusable custom skills and automation.
@jakevin7: AI 自己总结出来了 Agent-native 这个词告诉我 我还是有点吃惊的。 project_opencli_design_principle.md,核心三条: - OpenCLI 第一用户是 AI agent,不是人类开发者。所有能…
OpenCLI 项目提出 Agent-native 设计理念,将 AI agent 作为 CLI 的第一用户,所有能力设计以提升 agent 成功率为衡量标准。
@qloog: #每日推荐 Google ADK Go - 是 Google 推出的开源 Agent 开发框架 目标:用软件工程的原则来构建 AI Agent 核心设计理念: 1、Code-first:用 Go 代码定义 Agent 逻辑、工具和编排,而…
Google 发布了开源的 Agent 开发框架 ADK for Go,旨在通过软件工程原则构建 AI 代理,支持代码优先、模型无关和云原生部署。
@MiguelMaestroIA: 中国再次发力!开源了一款能查看屏幕的桌面Agent,完全本地运行 Screen/mouse/keybo…
中国开源了一款桌面AI Agent,能够通过自然语言查看屏幕并控制鼠标/键盘,完全本地运行,无需依赖云端。
计算机使用代理
# 计算机使用代理 来源: [https://openai.com/index/computer-using-agent/](https://openai.com/index/computer-using-agent/) 通过计算机使用代理(Computer-Using Agent)为Operator提供支持,这是AI与数字世界交互的通用接口。今天我们推出了[Operator\(在新窗口中打开\)](https://operator.chatgpt.com/)的研究预览版,这是一个能够在网络上为你执行任务的代理。Operator由计算机使用代理(CUA)驱动,这是一个结合了GPT-4o视觉功能的模型