pi-computer-use
Pi extension for GUI computer-use on macOS
Package details
Install pi-computer-use from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-computer-use- Package
pi-computer-use- Version
0.1.0- Published
- Apr 2, 2026
- Downloads
- 168/mo · 11/wk
- Author
- swairshah
- License
- unknown
- Types
- extension
- Size
- 122.1 KB
- Dependencies
- 1 dependency · 0 peers
Pi manifest JSON
{
"extensions": [
"./src/index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-computer-use
Pi extension for GUI computer-use on macOS. Gives your agent eyes and hands — it can see the screen, find UI elements, and interact with any app through native mouse/keyboard events.
Useful for launching, testing, and debugging GUI applications from pi.
How it works
- Screenshot — captures the screen or app window via macOS
screencapture - Grounding — sends the screenshot + a target description (e.g.
'button labeled "Save"') to a vision model to get pixel coordinates - Action — dispatches native input events via a compiled Swift helper
The Swift binary is compiled on first use and cached. No manual build step needed.
Install
pi install git:github.com/swairshah/pi-computer-use
The extension uses a Swift native helper for mouse/keyboard events, compiled automatically on first use. You'll need:
- Xcode Command Line Tools —
xcode-select --installif you don't have them - Accessibility permission for your terminal (System Settings → Privacy & Security → Accessibility)
- Screen Recording permission for your terminal (System Settings → Privacy & Security → Screen Recording)
Tools
Observation
| Tool | What it does |
|---|---|
gui_read |
Screenshot + optionally locate a target element |
gui_screenshot |
Screenshot only |
gui_cursor_position |
Current mouse (x, y) |
gui_clipboard_read |
Read system clipboard |
Mouse
| Tool | What it does |
|---|---|
gui_click |
Left/right/middle click. Supports modifier keys (Shift+click, Cmd+click, etc.) |
gui_double_click |
Double-click (select word, open file) |
gui_triple_click |
Triple-click (select line/paragraph) |
gui_right_click |
Right-click (context menu) |
gui_hover |
Hover (tooltips, hover menus) |
gui_drag |
Drag from A to B. Supports modifiers (Option+drag to duplicate) |
gui_scroll |
Scroll up/down/left/right |
Keyboard
| Tool | What it does |
|---|---|
gui_type |
Type text into a field (optionally click target first) |
gui_keypress |
Press a key (Enter, Tab, Escape, arrows, etc.) |
gui_hotkey |
Keyboard shortcut (Cmd+S, Shift+Cmd+P, etc.) |
Utility
| Tool | What it does |
|---|---|
gui_clipboard_write |
Write to system clipboard |
gui_wait |
Pause N milliseconds (animations, loading) |
gui_batch |
Chain multiple actions in one tool call |
gui_batch
Executes a sequence of actions without round-tripping through the LLM between each step. Each grounded action (click, type with target) takes a fresh screenshot, but you save inference calls.
gui_batch({ actions: [
{ action: "click", target: "search field" },
{ action: "type", value: "hello world" },
{ action: "keypress", key: "Enter" },
{ action: "wait", ms: 1000 },
{ action: "scroll", direction: "down", amount: 10 }
]})
Supported actions: click, right_click, double_click, triple_click, hover, drag, scroll, type, keypress, hotkey, wait, clipboard_read, clipboard_write. Stops on first error.
Source
src/
├── index.ts # Extension entry — registers tools with pi
├── runtime.ts # Screenshot capture, grounding, native input dispatch
├── grounding.ts # Vision model grounding (uses pi's model registry + pi-ai)
├── native-helper.ts # Embedded Swift source, compiled and cached at runtime
└── learn.ts # /learn command — record GUI demos and save as skills
Credits
GUI runtime adapted from understudy.