pi-agent-browser-native
pi extension that exposes agent-browser as a native tool for browser automation
Package details
Install pi-agent-browser-native from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-agent-browser-native- Package
pi-agent-browser-native- Version
0.2.32- Published
- May 21, 2026
- Downloads
- 5,716/mo · 2,232/wk
- Author
- fitchmultz
- License
- MIT
- Types
- extension
- Size
- 1.1 MB
- Dependencies
- 0 dependencies · 4 peers
Pi manifest JSON
{
"extensions": [
"./extensions/agent-browser/index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-agent-browser-native
A Pi extension that lets coding agents drive real browser sessions with a native agent_browser tool instead of brittle shell commands.
It is for Pi users who want agents to browse sites, inspect pages, click through flows, capture screenshots, use persistent profiles, and handle authenticated web apps without spending context on agent-browser CLI ceremony.
What this looks like in Pi
You prompt the agent in plain English:
Use the agent_browser tool to open https://react.dev and then take an interactive snapshot.
The agent gets a native tool, not a bash workaround:
{ "args": ["open", "https://react.dev"] }
{ "args": ["snapshot", "-i"] }
{ "semanticAction": { "action": "click", "locator": "text", "value": "Learn React" } }
The last form compiles to upstream find argv; see docs/TOOL_CONTRACT.md for the full field rules and for using raw args when you need anything outside that shorthand.
The result is optimized for agent work:
- compact page snapshots that lead with useful page content instead of chrome/sidebar noise
- interactive
@eNrefs for follow-up clicks and form fills - screenshots and downloaded files surfaced as Pi artifacts
- structured details for titles, URLs, saved files, sessions, and errors
- spill files for oversized raw output instead of dumping pages into context
- compact, colorized Pi TUI rows that can be expanded without changing what the agent receives
- recovery hints when a tab, selector, stale
@ref, or launch mode needs a different next step
Who this is for
- Pi users who want browser automation available as a normal tool beside
read,write, andbash. - Coding agents that need low-context browser workflows for docs, QA, research, dashboards, provider-backed browsers, and web apps.
- Maintainers who want a thin integration that tracks the current upstream
agent-browserCLI without bundling or re-implementing it.
The problem
agent-browser is powerful, but plain CLI use is awkward inside an agent harness:
- shell strings are easy for agents to quote wrong
- large page snapshots can waste model context
- screenshots and downloads need artifact metadata, not just text paths
- implicit browser sessions need predictable reuse and cleanup
- profile/debug launches need a clear way to start fresh after public browsing
- secrets and auth material must not be echoed into model-visible output
- stale element refs need actionable recovery guidance, not generic failures
pi-agent-browser-native keeps upstream agent-browser as the browser engine and adds the Pi-native wrapper behavior needed for reliable agent use.
What it does
| Pain | Native wrapper capability | Proof surface |
|---|---|---|
| Agents build fragile shell commands | Exposes agent_browser with exact args, an optional semanticAction shorthand for common find flows and native select, constrained job / qa presets, experimental sourceLookup / networkSourceLookup that compile short workflows to batch, top-level electron for desktop lifecycle, plus controlled stdin and sessionMode |
extensions/agent-browser/index.ts, docs/TOOL_CONTRACT.md |
| Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an Omitted high-value controls section (plus details.data.highValueControlRefIds) when dense pages hide inputs and tabs from the trimmed ref lists, and stores full raw output in spill files when needed |
extensions/agent-browser/lib/results/snapshot.ts, test/agent-browser.presentation.test.ts |
| Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | docs/COMMAND_REFERENCE.md |
| Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; test/agent-browser.resume-state.test.ts |
| Auth/profile workflows can leak secrets | Supports auth save --password-stdin and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills |
test/agent-browser.extension-validation.test.ts |
| Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts details.data for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted batch matrix on top-level details.data |
extensions/agent-browser/lib/results/presentation.ts, extensions/agent-browser/lib/runtime.ts, test/agent-browser.presentation.test.ts |
Stale @eN refs fail mysteriously |
Records per-session details.refSnapshot, rejects mismatched URLs / unknown refs / unsafe batch stdin ordering before spawn, adds recovery guidance to rerun snapshot -i or use stable find locators |
extensions/agent-browser/index.ts, test/agent-browser.results.test.ts, test/agent-browser.extension-validation.test.ts |
| Agents need stable success/failure buckets | Exposes bounded resultCategory, successCategory, and failureCategory on tool details for branching without parsing prose |
docs/TOOL_CONTRACT.md, extensions/agent-browser/lib/results/shared.ts, test/agent-browser.results.test.ts |
| Models re-snapshot after every click without new URL/title context | Adds optional details.pageChangeSummary (and per-batch-step summaries) with changeType, compact text, optional title/url, artifact hints, and nextActionIds aligned to nextActions; no-navigation clicks can also surface evidence-backed details.overlayBlockers candidates |
docs/TOOL_CONTRACT.md, extensions/agent-browser/lib/results/presentation.ts, test/agent-browser.presentation.test.ts |
| Dashboard scroll commands can look successful while nothing moves | Samples viewport and prominent scroll-container positions around top-level scroll calls; unchanged positions produce details.scrollNoop, visible recovery guidance, and exact nextActions for snapshot/screenshot verification |
docs/TOOL_CONTRACT.md, docs/COMMAND_REFERENCE.md, test/agent-browser.extension-validation.test.ts |
| Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class select <selector> <value...> paths through raw args, semanticAction, and job; for custom combobox clicks, detects focused controls with explicit aria-expanded state but no visible options and returns details.comboboxFocus plus exact recovery nextActions |
docs/TOOL_CONTRACT.md, docs/COMMAND_REFERENCE.md, test/agent-browser.extension-validation.test.ts |
Recording workflows fail late when ffmpeg is missing |
After successful record start / record restart, warns when ffmpeg is not on PATH so agents can install or fix PATH before record stop |
docs/TOOL_CONTRACT.md, docs/COMMAND_REFERENCE.md, test/agent-browser.extension-validation.test.ts |
| Direct binary help may be blocked in agent sessions | Publishes a repo-readable command reference and verifies it against the target upstream version | npm run verify |
| Desktop Electron apps need discovery, CDP attach, and safe teardown | Top-level electron runs host list / isolated launch (temp profile, OS-chosen debug port) / status / probe / cleanup, merges launchId plus managed sessionName, supports handoff snapshot / tabs / connect, and surfaces mismatch and post-command health guidance; wrapper cleanup applies only to launches it created |
extensions/agent-browser/lib/electron/discovery.ts, launch.ts, cleanup.ts, docs/TOOL_CONTRACT.md, docs/COMMAND_REFERENCE.md |
Agents need bundled skills text without touching the live session |
Treats skills list, skills get …, and skills path … as stateless JSON reads: no implicit managed --session under default sessionMode: "auto" (same session-ownership goal as plain-text --help / --version), while provider workflows stay thin passthroughs that require upstream setup and credentials |
docs/COMMAND_REFERENCE.md, extensions/agent-browser/lib/runtime.ts |
Fastest way to try it
Install upstream agent-browser first and make sure it is on PATH:
Optional external tools unlock the full command surface:
| Dependency | Required for | macOS install example |
|---|---|---|
agent-browser |
All browser automation through this extension | See upstream install docs |
ffmpeg |
record stop WebM encoding after record start / record restart |
brew install ffmpeg or brew install ffmpeg-full |
Keep both binaries on PATH. record start can begin without a file on disk, but record stop needs ffmpeg to encode the WebM.
The native tool also gives agents absolute installed-package doc paths in its compact runtime guidance. Agents should read README.md for setup/dependencies, docs/COMMAND_REFERENCE.md for targeted command workflows, and docs/TOOL_CONTRACT.md for result/detail contracts only when deeper guidance is needed.
Then install this Pi package:
pi install npm:pi-agent-browser-native
Start Pi and ask for a browser action:
Use the agent_browser tool to open https://example.com and then take an interactive snapshot.
For a one-off trial that does not touch your configured Pi extensions:
pi --no-extensions -e npm:pi-agent-browser-native
For a specific published version:
pi --no-extensions -e npm:pi-agent-browser-native@<version>
To install directly from source instead of npm:
pi install https://github.com/fitchmultz/pi-agent-browser-native
For a temporary source trial, keep it isolated from your normal package sources:
pi --no-extensions -e https://github.com/fitchmultz/pi-agent-browser-native
First-run health check
Run the read-only doctor when installing, upgrading, or debugging missing/duplicated tools:
pi-agent-browser-doctor
# one-off without permanent install:
npm exec --package pi-agent-browser-native -- pi-agent-browser-doctor
# from this checkout:
npm run doctor
The doctor checks:
- upstream
agent-browserexists onPATH - the installed upstream version matches this wrapper's command-reference baseline
- Pi settings do not point at multiple active
pi-agent-browser-nativesources
It does not edit Pi settings and does not run upstream agent-browser doctor --fix.
Common agent calls
You usually prompt the agent in natural language. These JSON snippets show the exact native tool shape the agent should use.
Open a page and inspect it:
{ "args": ["open", "https://example.com"] }
{ "args": ["snapshot", "-i"] }
Click a visible ref, then refresh refs after navigation or a DOM update:
{ "args": ["click", "@e2"] }
{ "args": ["snapshot", "-i"] }
Run a multi-step flow in one tool call:
{ "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
If the same batch stdin later uses @e… on interaction commands after a step that can navigate or mutate the page (open, click, reload, and similar), insert a snapshot step whose first argv token is snapshot (for example ["snapshot","-i"]) between those phases. Multiple same-snapshot fill @e… steps may be batched before a click/submit step; dynamic or autosubmit forms should still use stable locators or split with a fresh snapshot. The wrapper rejects unsafe ordering with failureCategory: "stale-ref" before upstream runs; full rules are under refSnapshot in docs/TOOL_CONTRACT.md.
Evaluate page JavaScript through stdin. Return the value you want as an expression; eval --stdin may warn with details.evalStdinHint when a function-shaped snippet serializes to {} instead of being invoked:
{ "args": ["eval", "--stdin"], "stdin": "document.title" }
{ "args": ["eval", "--stdin"], "stdin": "({ title: document.title, url: location.href })" }
Extract several known refs or selectors in one batch call instead of many serial getter calls:
{ "args": ["batch"], "stdin": "[[\"get\",\"text\",\"@e64\"],[\"get\",\"text\",\"@e65\"]]" }
Save an auth profile without putting the password in args:
{ "args": ["auth", "save", "demo", "--password-stdin"], "stdin": "<password>" }
Download a file from a known link or control:
{ "args": ["download", "@e5", "/tmp/report.pdf"] }
Locator shorthand (semanticAction)
For supported upstream find flows and native dropdown selection you can omit hand-built args and pass a top-level semanticAction object instead. The wrapper compiles locator actions to the same find argv upstream already understands, or compiles action: "select" to upstream select <selector> <value...>; compiled argv is echoed as details.compiledSemanticAction when the unified result includes that field. Full field rules live in docs/TOOL_CONTRACT.md#semanticaction.
{ "semanticAction": { "action": "click", "locator": "text", "value": "Submit" } }
{ "semanticAction": { "action": "click", "locator": "role", "role": "button", "name": "Continue without Signing In" } }
{ "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
{ "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
{ "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
Typical pitfalls:
- Supply exactly one of
args,semanticAction,job,qa,sourceLookup,networkSourceLookup, orelectronper call (not more, not none). semanticActionandjobare not valid insidebatchstdin; batch steps stay upstream argv string arrays (spell afindstep as tokens there if you need it in a batch).- Commands or locators outside the supported shorthand still require explicit
args. Common page getters are grouped underget: useget title,get url, orget text <selector>rather than shortcut commands such astitleorurl; unknown getter shortcuts can return read-onlydetails.nextActionslikeuse-get-title. - For
locator: "role", pass eithervalue: "button"orrole: "button"; if both are present they must match. - Use
semanticAction.sessionto target a named upstream browser session; the wrapper prepends--session <name>before the compiledfindorselectargv and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the currentsnapshot -irefs before execution so hidden duplicate matches do not steal the action;details.effectiveArgsshows the exact executed argv. - Do not reuse
@e…refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-awarerefresh-interactive-refsnext action. - If upstream classifies the failure as
stale-refanddetails.compiledSemanticActionis present for a compiledfindaction,details.nextActionsmay listretry-semantic-action-after-stale-refafterrefresh-interactive-refs, carrying the same compiledfindargv so you can retry the locator-stable target once it is safe to do so.selectcalls that used stale@refsonly get refresh guidance; use a fresh snapshot or stable selector before retrying (contract indocs/TOOL_CONTRACT.md#semanticaction). - If the failure is
selector-not-found, the wrapper may take one fresh snapshot and addCurrent snapshot ref fallbackplustry-current-visible-ref*next actions when that snapshot has exact visible role/name matches for the failedfind/semanticActiontarget. It still addsAgent-browser candidate fallbacksfor bounded semanticAction role/name retries (fill+placeholder,click+text, orfill+label); prefer these payloads or a fresh snapshot over guessing new selectors (same contract link). - A successful upstream
clickis not proof that the web app handled the event or changed state. When the task depends on a mutation, followinspect-after-mutation/pageChangeSummaryevidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action. - If a top-level
clicksucceeds (unified commandclick, not abatchstep), upstream reportsdata.clicked, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extrasnapshot -iand addPossible overlay blockerswithdetails.overlayBlockers(candidates,summary, optionalsnapshotrefresh for refs) plus session-awareinspect-overlay-state/ boundedtry-overlay-blocker-candidate-*next actions when that snapshot shows strong modal context (dialog/alertdialog) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check usesdetails.navigationSummary, which is populated with one read-onlyevalsummary when the click JSON omits both stringdata.urlanddata.title; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result. - If
get text <selector>reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successfulbatchsteps, the wrapper may addSelector text visibility warning,details.selectorTextVisibility(plusselectorTextVisibilityAllfor multiple batched warnings), andinspect-visible-text-candidatesnext actions; prefer a visible@ref, a scoped selector, or a targetedeval --stdinover hidden tab content. - In attached Electron sessions, broad selectors such as
body,html,main, or[role=application]may read the whole app shell. The wrapper may addBroad Electron get text selector warning,details.electronGetTextScopeWarning, andsnapshot-for-electron-text-scope; prefersnapshot -i, a current@ref, or a narrower panel selector.
Constrained browser jobs
For short repeatable workflows, pass a top-level job instead of hand-writing batch stdin. The wrapper only supports constrained steps (open, click, fill, select, wait, assertText, assertUrl, waitForDownload, and screenshot), compiles them to existing upstream batch commands, and echoes the compiled commands as details.compiledJob for auditability. The same compile path backs top-level qa, so long qa runs surface the same timeout evidence shape. If a long job, qa, or batch hits the wrapper watchdog, details.timeoutPartialProgress may recover planned steps, current page title/URL, and declared artifact paths that already exist on disk (see docs/TOOL_CONTRACT.md#details). There is no separate catalog of reusable named browser recipes above job, qa, and raw batch; see docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet for the closed RQ-0068 decision and when to revisit it.
{
"job": {
"steps": [
{ "action": "open", "url": "https://example.com" },
{ "action": "assertText", "text": "Example Domain" },
{ "action": "screenshot", "path": ".dogfood/example.png" }
]
}
}
On app pages that expose a native dropdown, add a select step such as { "action": "select", "selector": "#flavor", "value": "chocolate" } before the assertion that depends on it.
Use raw args/stdin when you need full upstream batch power, custom flags, or commands outside the constrained job schema. Do not pass stdin with job, qa, sourceLookup, networkSourceLookup, or electron; those modes generate or manage their own input.
Electron desktop apps
The dedicated guide for this section is docs/ELECTRON.md; it covers intended users, the full lifecycle, wrapper-owned vs manually launched apps, action reference, safety/ownership, qa.attached, sourceLookup context, troubleshooting, and cleanup. Read it first if Electron support is what brought you here.
For desktop Electron apps, use top-level electron to avoid hand-building the discover → launch with CDP → connect → inspect → cleanup sequence. The wrapper owns only apps it launched, uses an isolated temp profile and OS-chosen debug port, and reports exact cleanup/status next actions. It does not reuse the app's normal signed-in profile or attach to an already-running authenticated app, so launching Slack/Obsidian/VS Code this way may show first-run or sign-in UI instead of the user's live local state. When the explicit goal is signed-in local app state and host tools are available, launch the normal app with a debug port first (for example open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'), then attach with { "args": ["connect", "9222"], "sessionMode": "fresh" }; if the app is already running without a debug port, ask before relaunching it. electron.list may annotate likely private apps (for example notes, chat, mail, developer workspaces, or password/auth tools) as [likely sensitive: …]; those are hints only, so use caller-owned allow / deny policy before launching sensitive apps.
{ "electron": { "action": "list", "query": "code" } }
{ "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
{ "electron": { "action": "probe", "timeoutMs": 5000 } }
{ "electron": { "action": "cleanup", "launchId": "electron-…" } }
electron.probe.timeoutMs bounds each underlying read subprocess when dense desktop apps need a shorter or longer probe budget (omit for the normal tool subprocess default). electron.cleanup.timeoutMs caps upstream close plus host profile/process teardown and defaults to the implicit session close budget unless overridden. electron.status.timeoutMs only tightens managed-session title/url reads used for mismatch checks. Pass electron.probe.launchId when you want the probe tied to a wrapper-tracked launch instead of only the current managed session. Launch/status/probe results show both launchId (for status/cleanup/probe) and sessionName (for browser snapshot/tab commands); if the managed session drifts to about:blank while wrapper status still sees a live renderer, Electron-specific mismatch warnings and status/probe/reattach/snapshot next actions replace generic tab guidance. If the app process/debug port dies after a successful-looking mutation, the wrapper reports details.electronPostCommandHealth and fails with tab-drift instead of quietly continuing on about:blank. Launch timeouts expose details.electron.failure.diagnostics for PID, profile, DevToolsActivePort, and timing evidence.
launch.handoff still defaults to "snapshot"; it retries briefly when the first Electron snapshot has no refs. Use handoff: "tabs" as a safer diagnostic starting point when you only need target discovery and do not want interactive refs captured yet, or handoff: "connect" when you want attach-only and will run your own snapshot -i / tab commands next. For Electron quick inputs that rerender in place, a successful fill may include details.fillVerification if get value still disagrees; re-snapshot and use focus plus keyboard typing before submitting.
For an app you launched yourself with remote debugging enabled, use raw upstream attach instead and clean it up yourself:
{ "args": ["connect", "9222"], "sessionMode": "fresh" }
After either path, use qa: { "attached": true, ... } for a current-session smoke check without opening a URL.
Lightweight QA preset
For a quick smoke/QA pass, use top-level qa. It compiles to the same batch path as job. The URL form clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The attached form (qa: { "attached": true }) runs those checks against the current managed session, such as an attached Electron app, and rejects url. loadState defaults to "domcontentloaded"; set it to "load" or "networkidle" only when the stricter state is useful and the site is not expected to keep background requests alive. checkNetwork, checkConsole, and checkErrors default to true; set one to false to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain favicon or apple-touch-icon paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (details.qaPreset.warnings, with human-readable details.qaPreset.summary when the preset still passes). Exact predicates live in docs/TOOL_CONTRACT.md and classifyNetworkRequestFailure in extensions/agent-browser/lib/results/shared.ts.
{
"qa": {
"url": "https://example.com",
"expectedText": "Example Domain",
"screenshotPath": ".dogfood/qa-example.png"
}
}
Use custom job or raw batch when you need a different check sequence.
Experimental source lookup
For local app debugging, sourceLookup can gather candidate component/file locations for a visible UI element. It is explicit and evidence-based: pass a selector, reactFiberId, and/or componentName; the wrapper compiles those inputs to existing batch steps (is visible, get html when includeDomHints is not false, react inspect, react tree) and a bounded local workspace scan under the Pi session cwd (maxWorkspaceFiles defaults to 2000 and cannot exceed 5000; the scan records at most ten workspace-search candidates). Results appear in details.sourceLookup with status, candidates, limitations, and summary. Unlike qa, the wrapper does not mark the tool failed on an otherwise successful batch solely because status is no-candidates or because React metadata was missing; failed upstream steps (for example react inspect without DevTools) still fail the batch normally.
{ "sourceLookup": { "selector": "#save", "reactFiberId": "2", "componentName": "SaveButton" } }
This is an experiment, not a guarantee. React hints require a session opened with --enable react-devtools, and many builds do not expose useful sourcemap/source metadata; status: "no-candidates" is common when nothing matched, and status: "unsupported" only when no candidates were found and a compiled react batch step failed (if DOM or workspace search still produced candidates, you get candidates-found instead). For wrapper-tracked packaged Electron apps, a no-candidate result includes details.sourceLookup.workspaceRoot, optional details.sourceLookup.electronContext, limitations explaining that the scan is limited to the Pi cwd and does not unpack app bundles/app.asar, plus Electron snapshot/probe/tab next actions when a launch is known.
networkSourceLookup is the matching failed-request experiment. It runs network request <id> when requestId is present and/or network requests --filter … when filter or url is present (url supplies the filter pattern when filter is omitted); add session when the generated batch should target an explicit upstream session. It merges failed-request rows from the batch JSON with initiator-style hints and a bounded workspace literal scan (maxWorkspaceFiles defaults to 2000, cap 5000), surfaces everything under details.networkSourceLookup, and avoids automatic blame or edits. Compact network requests results with safe request IDs also add details.nextActions for request details, bounded networkSourceLookup on actionable failures, path filtering, or HAR capture so agents can branch without guessing request-id syntax. Network diagnostics are read-only for wrapper page state: request URLs in network request or generated networkSourceLookup batches do not replace the session’s active page target or invalidate page-scoped refs from the app page.
{ "networkSourceLookup": { "requestId": "req-1", "url": "/api/fail" } }
For asynchronous exports, click first and then wait for the download:
{ "args": ["click", "@export"] }
{ "args": ["wait", "--download", "/tmp/report.csv"] }
When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. With upstream agent-browser 0.27.0, treat details.savedFilePath as upstream-reported metadata and confirm details.artifacts[].exists before relying on the requested wait --download <path> file being present on disk.
Artifact cleanup is host-owned, not a browser command. close shuts down the browser session but does not delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty details.artifactManifest is in scope, a successful close appends an Artifact lifecycle note and sets details.artifactCleanup with the same retention summary as details.artifactRetentionSummary, a fixed note about host-owned cleanup, and explicitArtifactPaths: up to ten distinct paths from manifest rows whose storageScope is explicit-path (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
Start a fresh profiled browser after the implicit public-browsing session already exists:
{ "args": ["--profile", "Default", "open", "https://example.com/account"], "sessionMode": "fresh" }
After a successful unnamed fresh launch, later default sessionMode: "auto" calls follow that browser automatically. If the fresh launch fails or times out, details.managedSessionOutcome records whether the previous managed session was preserved or the attempted fresh session was abandoned before any managed session became current; a Managed session outcome: … line is appended only when the failing call used sessionMode: "fresh".
Authenticated/profile workflows
The wrapper does not clone profiles or hide what upstream Chrome profile you chose. Passing --profile is an explicit upstream agent-browser choice. Visible page content from real profiles is model-visible and may persist in transcripts or saved artifacts; redaction protects credential-like cookie/storage/auth values, not ordinary page text you asked the browser to read.
Use these rules:
- Use public/temp profiles for tests and examples.
- Use
sessionMode: "fresh"when switching from public browsing to--profile,--session-name,--cdp,--state,--auto-connect,--init-script,--enable,-p/--provider, or iOS--device. - Use
--sessionwhen you want to manage a live upstream session name yourself. - Do not treat
--sessionas persisted auth or tab restore afterclose; use--profile,--session-name, or--statefor persistence. - Prefer page actions and storage checks over cookie dumps.
cookies getcan expose real profile cookies. - Prefer
auth save --password-stdinover putting passwords inargs; the wrapper only accepts callerstdinforbatch,eval --stdin, andauth save --password-stdin(top-leveljobandqacompile tobatchand supply their own stdin). - Use
state save <path>/state load <path>for portable test state.state saveis reported as a file artifact with verification metadata;state loadmay mention a path but is not treated as a newly saved artifact. - Treat
cookies get,storage local|session, andauth showoutput as sensitive. The native presentation summarizes and redacts credential-like values, but avoid requesting these dumps unless the task needs them. - Use
dialog status,dialog accept [text],dialog dismiss, andframe <selector|main>through nativeargs; use exactconfirm <id>/deny <id>next actions for guarded-action confirmations.
Safe stateful examples:
{ "args": ["auth", "save", "demo", "--password-stdin"], "stdin": "password from the user-approved secret source" }
{ "args": ["auth", "login", "demo"] }
{ "args": ["state", "save", "/tmp/demo-state.json"] }
{ "args": ["state", "load", "/tmp/demo-state.json"], "sessionMode": "fresh" }
{ "args": ["cookies", "set", "theme", "dark", "--url", "https://example.com"] }
{ "args": ["storage", "local", "get", "theme"] }
{ "args": ["dialog", "accept", "prompt text"] }
{ "args": ["frame", "main"] }
Example explicit session plus profile launch:
{
"args": ["--session", "auth-flow", "--profile", "Default", "open", "https://example.com/account"]
}
React, SPA, and first-navigation setup
React and SPA tooling from upstream agent-browser is passed through directly.
Launch React introspection before first navigation:
{ "args": ["open", "--enable", "react-devtools", "https://example.com"], "sessionMode": "fresh" }
{ "args": ["react", "tree"] }
{ "args": ["react", "inspect", "<fiberId>"] }
{ "args": ["react", "renders", "start"] }
{ "args": ["react", "renders", "stop"] }
{ "args": ["react", "suspense", "--only-dynamic"] }
Use SPA and Web Vitals helpers as normal command tokens:
{ "args": ["pushstate", "/dashboard"] }
{ "args": ["vitals", "https://example.com", "--json"] }
For setup that must happen before first navigation, open a blank fresh page, stage routes/cookies/scripts, then navigate:
{ "args": ["open"], "sessionMode": "fresh" }
{ "args": ["network", "route", "**/*.js", "--abort", "--resource-type", "script"] }
{ "args": ["cookies", "set", "--curl", "/path/to/cookies.txt", "--domain", "example.com"] }
{ "args": ["navigate", "https://example.com"] }
Proof and verification
npm run docs checks that generated playbook fragments and command-reference baseline blocks match their canonical sources (extensions/agent-browser/lib/playbook.ts and scripts/agent-browser-capability-baseline.mjs) without invoking upstream agent-browser.
The local verification gate is:
npm run verify
It runs:
- generated playbook/documentation drift checks
tsc --noEmit- the test suite
- command-reference baseline checks
- live command-reference verification against the targeted installed upstream
agent-browser
Step order and which subprocesses run live in scripts/project.mjs; test/project-verify.test.ts locks default, release, real-upstream, package-pi, and combined-docs orchestration so a gate cannot disappear accidentally. Run npm run verify -- --help for opt-in modes and supported passthrough flags.
The deterministic agent-efficiency benchmark’s standalone JSON/Markdown accounting run is not part of default npm run verify (only npm run verify -- benchmark or npm run benchmark:agent-browser invokes the script). The full unit suite still exercises test/agent-browser.efficiency-benchmark.test.ts. Use the script before and after agent-facing abstractions to prove call-count, output-size, stale-ref, artifact, failure-category coverage, success-rate, and elapsed-time effects before changing the wrapper UX:
npm run benchmark:agent-browser
npm run verify -- benchmark
Save a JSON baseline (for example before changing playbook or wrapper behavior), then compare later runs: npm run benchmark:agent-browser -- --json > /tmp/agent-browser-benchmark.json and npm run benchmark:agent-browser -- --compare /tmp/agent-browser-benchmark.json.
It does not launch a browser or mutate local profiles; it models representative raw workflows and provides a stable baseline for later comparisons.
The opt-in real-upstream suite is separate because it drives a real browser installation:
npm run verify -- real-upstream
That mode sets PI_AGENT_BROWSER_REAL_UPSTREAM=1 and runs test/agent-browser.real-upstream-contract.test.ts against the real agent-browser on PATH (version must match the capability baseline). It covers inspection, skills, a broad core interaction and navigation matrix on localhost fixtures (including batch stdin and pushstate), plus vitals, network route/requests/HAR, diff snapshot/screenshot/url, trace/profiler, console/errors/highlight, stream enable/status/disable, cookies set --curl, a react tree missing-renderer path, and wait --download with the on-disk caveat documented in release notes. The harness uses a throwaway temp HOME and dedicated socket/screenshot directories so the run does not touch your normal browser profile paths. Browser-opening or credential-dependent families such as inspect, dashboard, chat, provider clouds, and OS clipboard flows stay in fake-upstream or manual validation unless a safe deterministic fixture is added. For prerequisites, isolation details, and troubleshooting, see docs/RELEASE.md.
For package release confidence, follow docs/RELEASE.md. The release gate is:
npm run doctor
npm run verify -- release
npm run verify -- release includes the default verification gate plus packaged Pi smoke coverage. The package also has a prepublishOnly hook that runs the same release gate and npm pack --dry-run during npm publish.
How it works
pi-agent-browser-native is intentionally thin:
- Pi loads
extensions/agent-browser/index.tsfrom the package manifest. - The extension registers one native tool named
agent_browser. - Tool calls are translated into upstream
agent-browserCLI invocations with controlled args, stdin, environment, timeout, and session planning. - Upstream JSON/plain-text output is parsed into model-friendly content and structured details.
- Screenshots, downloads, recordings, traces, profiles, and spill files are normalized as Pi-visible artifacts where possible.
- Generated playbook text in docs and tool metadata stays aligned with
extensions/agent-browser/lib/playbook.ts.
The upstream browser engine remains agent-browser. This package does not bundle it and does not maintain compatibility shims for old upstream versions.
Current limits
- Published pre-1.0 package.
- Targets the current locally installed upstream
agent-browserversion only. - Does not bundle
agent-browser; users install it separately. - Does not provide a human browser UI inside Pi; the primary UX is agent-invoked tool calls.
- Real authenticated profile use is powerful but sensitive. Treat profile and cookie access as user-approved, task-specific behavior.
- Wrapper tab/session recovery is best effort around observed upstream behavior, not a replacement for explicit profile/session design.
Local development
Install upstream agent-browser, then install dependencies:
npm install
Use the npm version declared in package.json packageManager when refreshing package-lock.json (for example npx -y npm@11.14.0 install) so optional-platform lockfile metadata does not drift. Align the global pi CLI with this repo’s pi-coding-agent devDependency range before lifecycle or interactive browser smokes. See Environment and automation pitfalls in docs/RELEASE.md.
Quick isolated checkout smoke test:
pi --no-extensions -e .
This bypasses Pi settings and configured extensions. After editing extension code, restart that Pi process to test the new checkout.
For a concrete expanded native-tool smoke matrix (version/help/skills through dashboard/chat families), see Local development validation in docs/RELEASE.md. For bounded release smokes that should validate this extension rather than skill routing, use the Sauce Demo smoke prompt, which adds --no-skills. When changes affect dense dashboards, diagnostics, artifacts, recording, scroll, or combobox behavior, use the public Grafana stress checklist for repeatable release dogfood without bundling private skills or recipes.
Configured-source lifecycle validation:
npm run verify -- lifecycle
The harness defaults to Pi model zai/glm-5.1 and 180000 ms per-step tmux waits; pass --model <id> and/or --timeout-ms <ms> after lifecycle when you need different settings (see Configured-source lifecycle validation in docs/RELEASE.md).
Use lifecycle validation when testing /reload, full restart, /resume, managed-session continuity, or persisted artifact behavior. Maintainers must run the same harness before every publish; see Pre-release checks.
Installed-package validation after publish:
npm run verify -- package-pi
pi --no-extensions -e npm:pi-agent-browser-native@<version>
Generated native-tool playbook notes
These sections are generated from extensions/agent-browser/lib/playbook.ts. Run npm run docs -- playbook write after changing the canonical playbook source.
Native inspection calls use the agent_browser tool shape, not shell-like direct-binary commands:
- { "args": ["--help"] }
- { "args": ["--version"] }
These calls return plain text and stay stateless: the extension does not inject its implicit session and does not let inspection consume the managed-session slot needed for later profile, session, CDP, state, auto-connect, or provider-backed launches.
- After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
- After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
- For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
- If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
Project map
| Path | Purpose |
|---|---|
extensions/agent-browser/index.ts |
Pi extension entrypoint and native tool wrapper |
extensions/agent-browser/lib/runtime.ts |
Argv parsing, session planning, redaction, and execution-plan helpers (pure planning; subprocess wiring lives beside the entrypoint) |
extensions/agent-browser/lib/results/ |
Model-facing result rendering and error guidance |
extensions/agent-browser/lib/playbook.ts |
Canonical generated agent/browser guidance |
scripts/agent-browser-capability-baseline.mjs |
Target upstream version, help samples, and doc/token inventory for drift checks |
scripts/check-command-reference-baseline.mjs |
Regenerates or verifies HTML-bounded baseline blocks in docs/COMMAND_REFERENCE.md (via npm run docs -- command-reference …) |
docs/COMMAND_REFERENCE.md |
Repo-readable native command reference |
docs/TOOL_CONTRACT.md |
Tool parameters, result shape, and behavior contract |
docs/ELECTRON.md |
Dedicated public guide for Electron desktop-app support |
docs/ARCHITECTURE.md |
Design decisions and implementation structure |
docs/REQUIREMENTS.md |
Product requirements and constraints |
docs/RELEASE.md |
Release, package, and lifecycle verification workflow |
docs/SUPPORT_MATRIX.md |
Current upstream support audit and release-readiness matrix |
test/ |
Wrapper, runtime, presentation, lifecycle, and package tests |
More docs
AGENTS.md— maintainer and agent runbooks, including upstream capability baseline rebaselining and Pi smoke testing intmuxdocs/COMMAND_REFERENCE.md— full native command reference and upstream capability baselinedocs/TOOL_CONTRACT.md— exact tool contractdocs/ELECTRON.md— Electron desktop-app guidedocs/ARCHITECTURE.md— how the wrapper is designeddocs/REQUIREMENTS.md— product constraints and non-goalsdocs/RELEASE.md— maintainer release workflowdocs/SUPPORT_MATRIX.md— current upstream support matrix and closure evidence
Where to go next
If you are a user, install the package and ask Pi to open a public page with agent_browser.
If you are evaluating the implementation, read extensions/agent-browser/index.ts, then run npm run verify.