@dmallory42/pi-read-url

Pi extension for extracting public HTML page URLs into clean markdown via system curl.

Package details

← Back

extension

Install @dmallory42/pi-read-url from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@dmallory42/pi-read-url

Package: @dmallory42/pi-read-url
Version: 0.1.2
Published: Apr 25, 2026
Downloads: 370/mo · 370/wk
Author: dmallory42
License: MIT
Types: extension
Size: 18.3 KB
Dependencies: 3 dependencies · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

@dmallory42/pi-read-url

A small pi extension that adds a read_url tool for turning public HTML page URLs into clean, readable markdown using the machine's system curl.

Fast, local, and lightweight — built for content extraction, not browser automation.

What it does

fetches public HTML page URLs via the user's system curl
extracts the main readable content with Mozilla Readability
converts extracted HTML to markdown with Turndown
keeps output compact by default to save tokens
supports maxChars for even smaller extracts
optionally includes metadata like site name, byline, excerpt, and HTTP status
rejects obvious non-page URLs like PDFs, media files, and common downloads
returns friendlier errors for DNS issues, blocked pages, timeouts, and unsupported content types

Install

From npm

pi install npm:@dmallory42/pi-read-url

From git

pi install git:github.com/dmallory42/pi-read-url

Then reload pi:

/reload

Usage

Ask pi naturally:

Read https://example.com
Use read_url to extract the main content from https://example.com
Use read_url on https://example.com with maxChars 4000
Use read_url on https://example.com and includeMetadata true
Use read_url on https://example.com and focus on the author bio

Tool behavior

read_url is built for extracting content from public HTML pages by URL.

It works best for:

blogs
docs
static content pages
article-like HTML pages

It is not intended for:

PDFs or office documents
images, media, and downloadable files
login-gated pages
JS-heavy SPAs
aggressively bot-protected sites
interactive browsing tasks like clicking, login flows, or form submission

Parameters

`url`

The HTTP(S) page URL to fetch.

`objective`

An optional focus hint to help frame what the caller cares about in the returned extract.

`maxChars`

Optional character cap for the returned markdown. Useful when you want to save tokens.

`includeMetadata`

Optional boolean. When enabled, the output also includes metadata like HTTP status, site name, byline, and excerpt.

Why

Sometimes you just want the content of a page URL in a form the model can use:

without sending it through a third-party fetch service
without hand-rolling curl | grep | sed pipelines
without jumping to a full browser automation stack

Development

Typecheck:

npm run typecheck

Run the local smoke test:

npm test

Release notes and workflow live in RELEASING.md.

License

MIT