pi-textbrowser

Headless browser for Pi — browse the web with DOM + OCR text maps. No image tokens, 10-50x cheaper than screenshot-based browsing.

Packages

Package details

extension

Install pi-textbrowser from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-textbrowser

Package: pi-textbrowser
Version: 1.1.0
Published: May 14, 2026
Downloads: not available
Author: nandal
License: MIT
Types: extension
Size: 28.7 KB
Dependencies: 2 dependencies · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./src/index.ts"
  ],
  "image": "https://raw.githubusercontent.com/nandal/pi-ext/main/textbrowser/screenshot.png"
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

TextBrowser for Pi

Headless browser extension for Pi — browse the web with structured DOM + OCR text maps. 10-50x cheaper than screenshot-based browsing.

┌─────────────┐     browser_navigate(url)     ┌─────────────┐
│   Pi Agent  │ ─────────────────────────────>│  Playwright │
│  (you)      │                               │   Chromium  │
│             │ <─ DOM + OCR text map ────────│             │
└─────────────┘        (~200 tokens)          └─────────────┘

Why TextBrowser?

Approach	~Tokens	Relative Cost
PNG 1920×1080 (vision model)	~1,500–3,000	100%
TextBrowser (text-only)	~150–400	5–15%

Vision-model screenshots burn thousands of tokens per page. TextBrowser captures the DOM structure + runs OCR on a screenshot, then discards the image. Only clean, structured text reaches the AI. You get element lists, bounding boxes, visible text, and OCR content — all for a fraction of the cost.

Need to see colors or layout? Flip to visual mode and get the PNG too.

Install

pi install npm:pi-textbrowser
npx playwright install chromium

Or add to your .pi/settings.json:

{
  "packages": ["npm:pi-textbrowser"]
}

Note: The Playwright Chromium binary is a one-time install.

Tools

Tool	What it does
`browser_navigate`	Open a URL, return page context
`browser_click`	Click by selector / text / XPath
`browser_type`	Fill input fields
`browser_scroll`	Scroll page or element into view
`browser_screenshot`	Capture current page context
`browser_read`	Read current page without changing it
`browser_evaluate`	Run JavaScript in the page

Dual-Mode Design

Text-only mode (default) — use for 90% of tasks

browser_navigate(url="https://example.com")

Screenshot captured only for OCR → image discarded
Returns: structured DOM elements + OCR text
Zero image tokens reach the AI
5-15× cheaper than visual mode

Use when: navigating, form filling, data extraction, workflow automation, reading content

Visual mode — use ONLY for pixels, colors, layout

browser_navigate(url="https://example.com", visual=true)

Screenshot captured for OCR and returned as base64 PNG
Returns: text map + actual image
5-15× more tokens than text-only

Use ONLY when: checking layout alignment, verifying color/theme, debugging CSS, reviewing design, reading image content

When to use which

Task	Mode
"Open Gitea and explore repos"	Text-only ✅
"Login to LinkedIn and post"	Text-only ✅
"Check if dark mode looks correct"	Visual 🖼️
"Is the button centered on the page?"	Visual 🖼️
"Read the article content"	Text-only ✅
"Compare this page to the mockup"	Visual 🖼️

Example Session

You: Open https://example.com and explore the page

→ browser_navigate(url="https://example.com")

Page: https://example.com/
Title: Example Domain
Viewport: 1920x1080

Elements (14 interactive of 82 total):
  [3] <a> href="https://iana.org/domains/example" text="More information..."
  ...

OCR (full page screenshot):
Example Domain
This domain is for use in illustrative examples in documents.
...

Requirements

Node.js 18+
Pi coding agent installed
Playwright Chromium: npx playwright install chromium

License

MIT © nandal

Built by Agent, for Agents 🤖