@dmallory42/pi-read-url
Pi extension for extracting public HTML page URLs into clean markdown via system curl.
Package details
Install @dmallory42/pi-read-url from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@dmallory42/pi-read-url- Package
@dmallory42/pi-read-url- Version
0.1.2- Published
- Apr 25, 2026
- Downloads
- 370/mo · 370/wk
- Author
- dmallory42
- License
- MIT
- Types
- extension
- Size
- 18.3 KB
- Dependencies
- 3 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
@dmallory42/pi-read-url
A small pi extension that adds a read_url tool for turning public HTML page URLs into clean, readable markdown using the machine's system curl.
Fast, local, and lightweight — built for content extraction, not browser automation.
What it does
- fetches public HTML page URLs via the user's system
curl - extracts the main readable content with Mozilla Readability
- converts extracted HTML to markdown with Turndown
- keeps output compact by default to save tokens
- supports
maxCharsfor even smaller extracts - optionally includes metadata like site name, byline, excerpt, and HTTP status
- rejects obvious non-page URLs like PDFs, media files, and common downloads
- returns friendlier errors for DNS issues, blocked pages, timeouts, and unsupported content types
Install
From npm
pi install npm:@dmallory42/pi-read-url
From git
pi install git:github.com/dmallory42/pi-read-url
Then reload pi:
/reload
Usage
Ask pi naturally:
Read https://example.com
Use read_url to extract the main content from https://example.com
Use read_url on https://example.com with maxChars 4000
Use read_url on https://example.com and includeMetadata true
Use read_url on https://example.com and focus on the author bio
Tool behavior
read_url is built for extracting content from public HTML pages by URL.
It works best for:
- blogs
- docs
- static content pages
- article-like HTML pages
It is not intended for:
- PDFs or office documents
- images, media, and downloadable files
- login-gated pages
- JS-heavy SPAs
- aggressively bot-protected sites
- interactive browsing tasks like clicking, login flows, or form submission
Parameters
url
The HTTP(S) page URL to fetch.
objective
An optional focus hint to help frame what the caller cares about in the returned extract.
maxChars
Optional character cap for the returned markdown. Useful when you want to save tokens.
includeMetadata
Optional boolean. When enabled, the output also includes metadata like HTTP status, site name, byline, and excerpt.
Why
Sometimes you just want the content of a page URL in a form the model can use:
- without sending it through a third-party fetch service
- without hand-rolling
curl | grep | sedpipelines - without jumping to a full browser automation stack
Development
Typecheck:
npm run typecheck
Run the local smoke test:
npm test
Release notes and workflow live in RELEASING.md.
License
MIT