@bytesbrains/pi-textbrowser

Headless browser for Pi — browse the web with DOM + OCR text maps. No image tokens, 10-50x cheaper than screenshot-based browsing.

Packages

Package details

extension

Install @bytesbrains/pi-textbrowser from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@bytesbrains/pi-textbrowser
Package
@bytesbrains/pi-textbrowser
Version
1.1.0
Published
May 15, 2026
Downloads
not available
Author
nandal
License
MIT
Types
extension
Size
28.7 KB
Dependencies
2 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./src/index.ts"
  ],
  "image": "https://raw.githubusercontent.com/nandal/pi-ext/main/textbrowser/screenshot.png"
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

TextBrowser for Pi

npm version license

Headless browser extension for Pi — browse the web with structured DOM + OCR text maps. 10-50x cheaper than screenshot-based browsing.

┌─────────────┐     browser_navigate(url)     ┌─────────────┐
│   Pi Agent  │ ─────────────────────────────>│  Playwright │
│  (you)      │                               │   Chromium  │
│             │ <─ DOM + OCR text map ────────│             │
└─────────────┘        (~200 tokens)          └─────────────┘

Why TextBrowser?

Approach ~Tokens Relative Cost
PNG 1920×1080 (vision model) ~1,500–3,000 100%
TextBrowser (text-only) ~150–400 5–15%

Vision-model screenshots burn thousands of tokens per page. TextBrowser captures the DOM structure + runs OCR on a screenshot, then discards the image. Only clean, structured text reaches the AI. You get element lists, bounding boxes, visible text, and OCR content — all for a fraction of the cost.

Need to see colors or layout? Flip to visual mode and get the PNG too.

Install

pi install npm:pi-textbrowser
npx playwright install chromium

Or add to your .pi/settings.json:

{
  "packages": ["npm:pi-textbrowser"]
}

Note: The Playwright Chromium binary is a one-time install.

Tools

Tool What it does
browser_navigate Open a URL, return page context
browser_click Click by selector / text / XPath
browser_type Fill input fields
browser_scroll Scroll page or element into view
browser_screenshot Capture current page context
browser_read Read current page without changing it
browser_evaluate Run JavaScript in the page

Dual-Mode Design

Text-only mode (default) — use for 90% of tasks

browser_navigate(url="https://example.com")
  • Screenshot captured only for OCR → image discarded
  • Returns: structured DOM elements + OCR text
  • Zero image tokens reach the AI
  • 5-15× cheaper than visual mode

Use when: navigating, form filling, data extraction, workflow automation, reading content

Visual mode — use ONLY for pixels, colors, layout

browser_navigate(url="https://example.com", visual=true)
  • Screenshot captured for OCR and returned as base64 PNG
  • Returns: text map + actual image
  • 5-15× more tokens than text-only

Use ONLY when: checking layout alignment, verifying color/theme, debugging CSS, reviewing design, reading image content

When to use which

Task Mode
"Open Gitea and explore repos" Text-only ✅
"Login to LinkedIn and post" Text-only ✅
"Check if dark mode looks correct" Visual 🖼️
"Is the button centered on the page?" Visual 🖼️
"Read the article content" Text-only ✅
"Compare this page to the mockup" Visual 🖼️

Example Session

You: Open https://example.com and explore the page

→ browser_navigate(url="https://example.com")

Page: https://example.com/
Title: Example Domain
Viewport: 1920x1080

Elements (14 interactive of 82 total):
  [3] <a> href="https://iana.org/domains/example" text="More information..."
  ...

OCR (full page screenshot):
Example Domain
This domain is for use in illustrative examples in documents.
...

Requirements

  • Node.js 18+
  • Pi coding agent installed
  • Playwright Chromium: npx playwright install chromium

License

MIT © nandal


Built by Agent, for Agents 🤖