pi-fetch-tool

Pi extension: register a web_fetch tool that turns any URL into LLM-friendly JSON-LD or Markdown. Tier 0 (JSON-LD), Tier 1 (Readability), Tier 2 (agent-browser / macOS WKWebView).

Packages

Package details

extension

Install pi-fetch-tool from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-fetch-tool

Package: pi-fetch-tool
Version: 0.1.5
Published: Jun 7, 2026
Downloads: not available
Author: livos
License: MIT
Types: extension
Size: 50.2 KB
Dependencies: 5 dependencies · 0 peers

Pi manifest JSON

{
  "extensions": [
    "./src/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-web-fetch

A pi extension that registers a web_fetch tool, turning any URL into LLM-friendly content with minimal token cost.

What it does

When the LLM calls web_fetch, the extension tries the cheapest sufficient backend:

Tier	Backend	Cost	Speed	Use case
T0	JSON-LD / OpenGraph regex extraction	0 LLM extraction	~50ms	Product pages, articles with schema.org markup
T1	`fetch` + Mozilla Readability + Markdown	~500 tokens	~500ms	Articles, blog posts, news, documentation
T2	`agent-browser` (headless Chrome)	~2-3s latency	JS-rendered SPAs, dynamic content

T0 and T1 run in parallel. T2 is only invoked when both T0 and T1 are insufficient and the user (or LLM) didn't set render: "never".

Token cost comparison

For a typical news article:

Method	Token cost
Raw HTML via `curl`	~50,000
This extension (T0 hits on a product page)	~300 (structured JSON)
This extension (T1 on a plain article)	~1,500 (clean Markdown)
T2 (SPA)	~3,000 (accessibility tree)

Installation

pi install pi-fetch-tool

The extension's npm dependencies include agent-browser, which auto-installs a Chrome binary. After install, the agent-browser skill becomes visible to the LLM as well.

Usage

The LLM calls it automatically when the user pastes a URL. Example LLM-side:

{
  "name": "web_fetch",
  "arguments": {
    "url": "https://example.com/article",
    "render": "auto"
  }
}

Parameters

Parameter	Type	Default	Description
`url`	string	(required)	The URL to fetch
`render`	`"auto"` \| `"never"` \| `"always"`	`"auto"`	When to use a headless browser. `"never"` disables T2; `"always"` forces T2.
`selector`	string	—	Optional CSS selector. Returns the first matching element's text. Bypasses T0/T1.
`max_tokens`	number	`8000`	Truncates output (with marker) if exceeded.

Output

A structured JSON payload. The LLM gets a single uniform shape regardless of tier:

{
  "type": "structured",            // or "markdown" | "error"
  "source": "jsonld",              // or "readability" | "agent-browser" | "webkit" | "selector"
  "url": "https://example.com/...",
  "schema": "Product",             // (structured only)
  "data": { ... },                 // (structured only)
  "content": "...",                // (markdown only)
  "truncated": false,              // (if max_tokens hit)
  "meta": {
    "fetchedAt": 1234567890,
    "cacheHit": false,
    "tiersAttempted": ["t0", "t1"],
    "latencyMs": 234
  }
}

Configuration

Edit ~/.pi/agent/settings.json:

{
  "webFetch": {
    "t2Backend": "auto",            // "auto" | "webkit" | "agent-browser"
    "cacheTtlMs": 3600000           // 1 hour
  }
}

Changes take effect on the next web_fetch call without restart.

How it works

Cache check (URL+options hash, 1h TTL)
Single fetch() for HTML
If selector is given: cheerio-based selector extraction (short-circuit)
Parallel: T0 (JSON-LD regex) and T1 (Readability + Markdown)
T0 wins if it returns a meaningful structured-data object
T1 wins if it returns ≥ 200 chars of clean Markdown
Otherwise: escalate to T2 (unless render: "never")
Format, cache, return

macOS Pi.app integration (P4)

On macOS Pi.app, window.piNative.webFetch(url) is available and uses the app's existing WKWebView — no separate Chrome download, sub-second latency. T2 prefers this over agent-browser.

Development

npm install
npm test                # vitest run
npm run typecheck       # tsc --noEmit
npm run test:smoke      # RUN_SMOKE=1 (manual, gated)

Project layout

src/
├── index.ts                 # registerTool, registerCommand, resources_discover
├── router.ts                # tier orchestration
├── formatter.ts             # unified output shape, truncation
├── cache.ts                 # in-memory cache with TTL
├── preferences.ts           # settings.json reader
├── check-env.ts             # agent-browser availability
├── extractors/
│   ├── jsonld.ts            # T0
│   └── selector.ts          # CSS selector (used with `selector` param)
└── backends/
    ├── http.ts              # T1 input (fetch)
    ├── readability.ts       # T1 output (Mozilla Readability + turndown)
    ├── agent-browser.ts     # T2 cross-platform
    └── webkit.ts            # T2 macOS Pi.app

tests/
├── extractors/              # unit
├── cache.test.ts
├── formatter.test.ts
├── preferences.test.ts
├── check-env.test.ts
├── router.test.ts
├── integration.test.ts      # end-to-end through router
└── fixtures/                # HTML samples

Feedback loop & iteration plan

After real-world use, revisit these design decisions:

Tier routing heuristic — does isRichJsonLd (≥3 fields + rich field) produce correct picks? If the LLM frequently gets OG-only results when Readability would have given better content, lower the threshold.
T2 timeout — 30s default may be too aggressive for slow SPAs. Gather telemetry on median T2 latency.
Cache TTL — 1h default may serve stale data. Users may want per-site overrides.
max_tokens default — 8000 tokens may be too high for common use. If usage data shows LLMs always see "truncated" markers, lower the default.
Readability MIN_LENGTH — 200 chars may miss valid short pages (FAQ entries, changelogs). Track how often T1 returns null for non-SPA pages.
agent-browser user-data-dir — cookie-sharing with the user's browser session is a common ask. Track whether the current env-var passthrough is sufficient.

File issues or feature requests at https://github.com/asiachrispy/pi-web-fetch/issues.

License

MIT