@johnnywu/pi-webfetch

Fetch web pages and URLs from pi with readable text, Markdown, HTML, or JSON output.

Packages

Package details

extension

Install @johnnywu/pi-webfetch from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@johnnywu/pi-webfetch
Package
@johnnywu/pi-webfetch
Version
1.3.0
Published
May 29, 2026
Downloads
479/mo · 288/wk
Author
johnnywu
License
MIT
Types
extension
Size
108.4 KB
Dependencies
1 dependency · 4 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-webfetch

A pi package that adds a webfetch tool for reading web URLs as clean Markdown, HTML, text, or YouTube metadata JSON.

webfetch is optimized for agent use:

  • General web pages are cleaned into readable content.
  • GitHub URLs are fetched with gh for better repository, issue, PR, file, and directory results.
  • YouTube URLs are fetched with yt-dlp for video metadata, transcripts, playlists, and channel listings.

Install

pi install npm:@johnnywu/pi-webfetch

Or via local path in ~/.pi/agent/settings.json while developing:

{
  "packages": ["~/dev/jwu/pi-webfetch"]
}

Requirements

Install the optional CLI tools for the URL types you want to support:

URL type Required executable Notes
General web pages scrapling Used for non-GitHub, non-YouTube URLs
GitHub / Gist gh Must be installed and authenticated for private or rate-limited content
YouTube yt-dlp Used for videos, transcripts, playlists, Shorts, and channels

Defuddle is bundled as an npm dependency and is used by default to improve Markdown output for general web pages.

Tool

webfetch

Fetch and clean an HTTP(S) URL.

Parameter Type Default Description
url string required HTTP(S) URL to inspect and fetch
mode markdown | html | text | json markdown Output mode. json is supported for YouTube results.

Examples:

{ "url": "https://example.com/article" }
{ "url": "https://github.com/jwu/pi-webfetch" }
{ "url": "https://www.youtube.com/watch?v=PIdETjcXNIk" }
{ "url": "https://www.youtube.com/@Brandon-Melville", "mode": "json" }

What you get

General web pages

Default output is readable Markdown. You can request raw-ish cleaned HTML or plain text with mode: "html" or mode: "text".

GitHub URLs

GitHub URLs are routed through gh, so common GitHub pages return useful CLI/API content instead of noisy browser HTML. Supported URL shapes include:

  • repositories
  • users
  • issues
  • pull requests
  • releases
  • Actions runs
  • gists
  • files and directories
  • commits and other API-backed paths

YouTube URLs

YouTube URLs are routed through yt-dlp.

Supported inputs include:

  • video URLs
  • youtu.be short links
  • playlists
  • channel handles, for example https://www.youtube.com/@name
  • channel Videos / Shorts / Streams tabs

For videos, webfetch returns metadata and tries to include a transcript. Missing transcripts do not fail the request.

For playlists, webfetch returns a flat list of entries.

For channel root URLs, webfetch expands available Videos, Shorts, and Streams tabs and merges their entries into one channel result.

mode: "json" returns a curated stable JSON shape for YouTube video, playlist, or channel data.

Configuration

Add webfetch settings to .pi/settings.json (project) or ~/.pi/agent/settings.json (global):

{
  "webfetch": {
    "useDefuddle": true,
    "qualityJudge": false,
    "qualityJudgeModel": "google/gemini-2.5-flash",
    "qualityJudgeThinkLevel": "off"
  }
}

Project settings override global settings. The dotted key form also works:

{
  "webfetch.useDefuddle": true,
  "webfetch.qualityJudge": true,
  "webfetch.qualityJudgeModel": "google/gemini-2.5-flash",
  "webfetch.qualityJudgeThinkLevel": "off"
}

Settings

Setting Default Description
webfetch.useDefuddle true Use Defuddle to convert cleaned HTML to Markdown for general web pages. Set false to use Scrapling Markdown directly.
webfetch.qualityJudge false Ask a model to reject unusable fetched Markdown, such as boilerplate, captcha/challenge pages, or unrelated content.
webfetch.qualityJudgeModel current pi model Optional judge model in provider/model form.
webfetch.qualityJudgeThinkLevel off Optional judge thinking level: off, minimal, low, medium, high, or xhigh.

Output behavior

  • Only http:// and https:// URLs are accepted.
  • Missing CLI executables return a friendly failed tool result.
  • Output is truncated with pi's standard limits: 2000 lines or 50 KiB, whichever is hit first.
  • If output is truncated, the full extracted content is saved to a temp file and the path is included in the result.

Internal docs

Implementation details are documented in:

Development

# Install dependencies
bun install

# Run tests
bun test

# Type check
bun run typecheck

# Format
bun run format

# Release (local, requires GH_TOKEN and NPM_TOKEN)
bun run release

This project uses semantic-release with conventional commits.

License

MIT