pi-droid
Android phone control for pi-agent — 36 tools to see, touch, and automate any device via ADB
Package details
Install pi-droid from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-droid- Package
pi-droid- Version
0.2.0- Published
- Apr 14, 2026
- Downloads
- 301/mo · 21/wk
- Author
- artemisai-dev
- License
- MIT
- Types
- extension, skill
- Size
- 495.8 KB
- Dependencies
- 0 dependencies · 4 peers
Pi manifest JSON
{
"image": "https://raw.githubusercontent.com/ArtemisAI/pi-droid/main/banner.jpeg",
"skills": [
"./skills"
],
"extensions": [
"./dist/index.js"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
Pi-Droid

Give your AI agent hands on an Android phone.
pi install pi-droid
Pi-Droid is a pi-agent extension that gives AI agents direct, real-time control over Android devices via ADB. Annotated screenshots with numbered element indices let the agent tap, swipe, and type without guessing pixel coordinates — works on any screen size, any device.
What it does
| Category | Tools | Highlights |
|---|---|---|
| See | 6 perception tools | Annotated screenshots with numbered indices, raw UI tree, OCR fallback |
| Touch | 5 input tools | Tap by text/ID/coords, type with clear-first, swipe, scroll, key events |
| Navigate | 3 app tools | Launch/stop apps, wait for elements, wait for activities |
| System | 6 system tools | Battery, settings, processes, logcat, shell, APK install |
| Manage | 3 device tools | Multi-device registry, preflight checks, WiFi ADB |
| Lock | 4 lock tools | Query/set/clear PIN and pattern locks |
| Record | 2 recording tools | Screen recording, gesture macro record/save/replay |
| Automate | 3 automation tools | Wake+unlock, find-and-tap with scrolling, scroll-until-found |
| Extend | 4 plugin tools | Discover capabilities, run plugin actions, health check, heartbeat cycle |
36 tools total, organized into 4 skills that scope tool sets for focused agent behavior.
Prerequisites
| Requirement | Notes |
|---|---|
| Node.js >= 18 | ESM support required |
| ADB on PATH | adb devices should list your device |
| Android device | USB debugging enabled, USB or WiFi connected |
| ADBKeyboard | Required for Unicode text input via android_type |
| Tesseract OCR (optional) | Only needed for android_ocr tool |
Installation
pi install pi-droid
Set your device serial (optional if only one device is connected):
export ANDROID_SERIAL=your_device_serial
Verify
npx tsx run.mts screen
Development
git clone https://github.com/ArtemisAI/pi-droid.git
cd pi-droid
npm install
npm run build
npm test
Tool Reference
Perception (6 tools)
| Tool | Description |
|---|---|
android_look |
Annotated screenshot with numbered element index — primary perception tool |
android_screenshot |
Raw screenshot — use for failure diagnosis only |
android_ui_dump |
Raw UI tree XML for full element hierarchy |
android_ocr |
Tesseract OCR on current screen or saved screenshot |
android_observe |
Continuous screen state observation |
android_screen_state |
Current activity, package, orientation, lock state as JSON |
Input (5 tools)
| Tool | Description |
|---|---|
android_tap |
Tap by coordinates, text match, or resource ID; supports long press |
android_type |
Type text into the focused field; optional clear_first to replace |
android_swipe |
Swipe between two coordinates with configurable duration |
android_scroll |
Scroll up or down on the current screen |
android_key |
Press a key: back, home, enter, tab, or any KEYCODE_* |
App and Navigation (3 tools)
| Tool | Description |
|---|---|
android_app |
Launch, stop, or check status of an app by package name |
android_wait |
Wait for an element to appear (by text or resource ID) with timeout |
android_wait_activity |
Wait for a specific activity to reach the foreground |
System (6 tools)
| Tool | Description |
|---|---|
android_device_info |
Battery, network, and hardware info |
android_settings |
Read/write system settings (WiFi, Bluetooth, brightness, volume, etc.) |
android_processes |
List running processes, kill by PID or name |
android_logcat |
Capture, search, and clear logcat |
android_shell |
Execute arbitrary ADB shell commands |
android_install |
Install/uninstall APKs, check package versions |
Device Management (3 tools)
| Tool | Description |
|---|---|
android_devices |
List connected devices, register/unregister, set active device |
android_preflight |
Run device readiness checks (ADB, screen, battery, etc.) |
android_wifi |
Connect/disconnect WiFi ADB, auto-discover devices |
Lock Management (4 tools)
| Tool | Description |
|---|---|
android_lock_status |
Query current lock state and type |
android_lock_clear |
Remove existing lock |
android_lock_set_pattern |
Set a pattern lock |
android_lock_set_pin |
Set a PIN lock |
Recording and Macros (2 tools)
| Tool | Description |
|---|---|
android_record |
Start/stop screen recording, pull recordings |
android_macro |
Record, save, load, and replay gesture macros |
Automation (3 tools)
| Tool | Description |
|---|---|
android_ensure_ready |
Wake screen, unlock, dismiss overlays — call before any automation |
android_find_and_tap |
Search UI tree for element and tap it; retries with scrolling |
android_scroll_find |
Scroll until an element appears, then return it |
Plugin System (4 tools)
| Tool | Description |
|---|---|
android_skills |
Discover all loaded plugin capabilities and parameters |
android_plugin_action |
Execute a plugin action with approval gates for sensitive operations |
android_plugin_status |
Get health status of all loaded plugins |
android_plugin_cycle |
Run a plugin's autonomous heartbeat cycle |
Skills
Pi-Droid defines four skills that scope tool sets for focused agent behavior. Skills are discovered automatically by pi-agent's package manager.
| Skill | Description | Key Tools |
|---|---|---|
android-screen |
Perceive device state | look, screen_state, screenshot, ui_dump, ocr, observe |
android-interact |
Perform actions on the device | tap, type, swipe, scroll, key, app |
android-automate |
High-level automation sequences | ensure_ready, find_and_tap, scroll_find, wait, wait_activity, preflight |
android-plugin |
Manage and execute app plugins | skills, plugin_action, plugin_status, plugin_cycle |
Plugin System
Pi-Droid's plugin system lets you add app-specific automation behind a standard interface. Plugins can declare approval gates for sensitive actions (sending messages, making purchases, posting content).
Building a plugin
Extend CliPlugin for CLI-backed apps, or implement PiDroidPlugin for full control:
import { CliPlugin } from "pi-droid";
export class WeatherPlugin extends CliPlugin {
name = "weather";
// ...
}
Loading plugins
Plugins are configured in config/default.json and loaded automatically on session start:
{
"plugins": {
"weather": {
"enabled": true,
"package": "@example/pi-droid-weather"
}
}
}
Distribute plugins as npm packages. See PLUGINS.md for the full plugin development guide, manifest schema, and marketplace integration.
Configuration
Pi-Droid reads configuration from config/default.json:
{
"adb": {
"serial": "your_device_serial"
},
"plugins": {},
"routing": {}
}
| Key | Purpose |
|---|---|
adb.serial |
Device serial (overridden by ANDROID_SERIAL env var) |
plugins |
Plugin configurations keyed by name |
routing |
Input router settings for deterministic action dispatch |
Environment variables:
| Variable | Purpose |
|---|---|
ANDROID_SERIAL |
Target device serial (takes precedence over config) |
Programmatic Usage
Pi-Droid exports its full ADB layer for use in custom automation scripts, external plugins, or standalone tools:
import {
Device,
tap, swipe, typeText, keyEvent,
takeScreenshot, annotatedScreenshot,
getScreenState, waitForActivity,
launchApp, stopApp,
dumpUiTree, findElement, waitForElement,
ensureReady, findAndTap, scrollToFind,
getBatteryInfo, getDeviceInfo,
adbShell,
} from "pi-droid";
// Connect to a device
const device = await Device.connect(process.env.ANDROID_SERIAL);
// Take an annotated screenshot with numbered elements
const annotated = await annotatedScreenshot();
// Find and tap an element by text
await findAndTap({ text: "Settings" });
// Wait for an activity transition
await waitForActivity("com.android.settings/.Settings");
// Run a raw ADB shell command
const result = await adbShell("dumpsys battery");
Available exports
- Device abstraction:
Device - Command execution:
adb,adbShell,AdbError,listDevices,isDeviceReady - Input:
tap,swipe,typeText,keyEvent,pressBack,pressHome,pressEnter,scrollDown,scrollUp - Screen state:
getScreenState,waitForActivity,getActivityStack,isKeyboardVisible,getOrientation - Screenshots and perception:
takeScreenshot,screenshotBase64,annotatedScreenshot,dumpUiTree,findElements,findElement,waitForElement - App management:
launchApp,stopApp,getAppInfo,listPackages,wakeScreen,isScreenOn - Monitoring:
getBatteryInfo,getNetworkInfo,getDeviceInfo,isScreenLocked,getRunningApps - Automation:
ensureReady,findAndTap,scrollToFind,DefaultStuckDetector,createTaskBudget - OCR:
runOcrOnImage,runOcrOnCurrentScreen - Plugin system:
PluginManager,CliPlugin,TelegramPlugin,ApprovalQueue - Types:
AdbExecOptions,UIElement,Bounds,ElementSelector,StuckEvent, and more
Development
npm install # Install dependencies
npm run build # Compile TypeScript
npm run lint # Type-check without emitting
npm test # Run all tests (473+)
npm run test:unit # Unit tests only (mocked ADB)
npm run test:ci # CI-safe subset (unit + sandbox integration)
npm run test:integration # All integration tests
npm run test:device # Full suite including device E2E
npm run dev # Development mode (load as pi extension)
Project structure
src/
index.ts Extension entry point
adb/ ADB primitives (app-agnostic, 28 modules)
tools/ LLM tool registrations
plugins/ Plugin system (loader, CLI base class, marketplace)
notifications/ Notification channels and approval queues
skills/ Skill definitions (scoped tool sets)
tests/ 473+ tests mirroring src/ structure
config/ Default configuration
Contributing
See CONTRIBUTING.md for development guidelines, coding standards, and the PR checklist.
Security
See SECURITY.md for vulnerability reporting and security considerations.
Changelog
See CHANGELOG.md for release history.
License
MIT -- ArtemisAI
