pi-textbrowser
Headless browser for Pi — browse the web with DOM + OCR text maps. No image tokens, 10-50x cheaper than screenshot-based browsing.
Package details
Install pi-textbrowser from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-textbrowser- Package
pi-textbrowser- Version
1.1.0- Published
- May 14, 2026
- Downloads
- not available
- Author
- nandal
- License
- MIT
- Types
- extension
- Size
- 28.7 KB
- Dependencies
- 2 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./src/index.ts"
],
"image": "https://raw.githubusercontent.com/nandal/pi-ext/main/textbrowser/screenshot.png"
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
TextBrowser for Pi
Headless browser extension for Pi — browse the web with structured DOM + OCR text maps. 10-50x cheaper than screenshot-based browsing.
┌─────────────┐ browser_navigate(url) ┌─────────────┐
│ Pi Agent │ ─────────────────────────────>│ Playwright │
│ (you) │ │ Chromium │
│ │ <─ DOM + OCR text map ────────│ │
└─────────────┘ (~200 tokens) └─────────────┘
Why TextBrowser?
| Approach | ~Tokens | Relative Cost |
|---|---|---|
| PNG 1920×1080 (vision model) | ~1,500–3,000 | 100% |
| TextBrowser (text-only) | ~150–400 | 5–15% |
Vision-model screenshots burn thousands of tokens per page. TextBrowser captures the DOM structure + runs OCR on a screenshot, then discards the image. Only clean, structured text reaches the AI. You get element lists, bounding boxes, visible text, and OCR content — all for a fraction of the cost.
Need to see colors or layout? Flip to visual mode and get the PNG too.
Install
pi install npm:pi-textbrowser
npx playwright install chromium
Or add to your .pi/settings.json:
{
"packages": ["npm:pi-textbrowser"]
}
Note: The Playwright Chromium binary is a one-time install.
Tools
| Tool | What it does |
|---|---|
browser_navigate |
Open a URL, return page context |
browser_click |
Click by selector / text / XPath |
browser_type |
Fill input fields |
browser_scroll |
Scroll page or element into view |
browser_screenshot |
Capture current page context |
browser_read |
Read current page without changing it |
browser_evaluate |
Run JavaScript in the page |
Dual-Mode Design
Text-only mode (default) — use for 90% of tasks
browser_navigate(url="https://example.com")
- Screenshot captured only for OCR → image discarded
- Returns: structured DOM elements + OCR text
- Zero image tokens reach the AI
- 5-15× cheaper than visual mode
Use when: navigating, form filling, data extraction, workflow automation, reading content
Visual mode — use ONLY for pixels, colors, layout
browser_navigate(url="https://example.com", visual=true)
- Screenshot captured for OCR and returned as base64 PNG
- Returns: text map + actual image
- 5-15× more tokens than text-only
Use ONLY when: checking layout alignment, verifying color/theme, debugging CSS, reviewing design, reading image content
When to use which
| Task | Mode |
|---|---|
| "Open Gitea and explore repos" | Text-only ✅ |
| "Login to LinkedIn and post" | Text-only ✅ |
| "Check if dark mode looks correct" | Visual 🖼️ |
| "Is the button centered on the page?" | Visual 🖼️ |
| "Read the article content" | Text-only ✅ |
| "Compare this page to the mockup" | Visual 🖼️ |
Example Session
You: Open https://example.com and explore the page
→ browser_navigate(url="https://example.com")
Page: https://example.com/
Title: Example Domain
Viewport: 1920x1080
Elements (14 interactive of 82 total):
[3] <a> href="https://iana.org/domains/example" text="More information..."
...
OCR (full page screenshot):
Example Domain
This domain is for use in illustrative examples in documents.
...
Requirements
- Node.js 18+
- Pi coding agent installed
- Playwright Chromium:
npx playwright install chromium
License
MIT © nandal
Built by Agent, for Agents 🤖
