pi-web-utils

Configurable web search, markdown-first webpage fetching, GitHub local repo search tools for pi coding agent

Packages

Package details

extension

Install pi-web-utils from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-web-utils
Package
pi-web-utils
Version
0.1.1
Published
Feb 25, 2026
Downloads
129/mo · 7/wk
Author
shantanugoel
License
MIT
Types
extension
Size
69 KB
Dependencies
3 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-web-utils

Configurable web tooling extension for pi-coding-agent.

It adds four tools:

  1. web_search
  2. fetch_webpage
  3. clone_github_repo
  4. search_local_repo

What it does

  • Search with configurable engines (google, duckduckgo, searxng, or custom) and ordered fallback.
  • Append engine-specific query params and headers from config and per-call overrides.
  • If a search engine returns HTML, optionally convert that raw HTML with the same formatter used by webpage fetch (markdown or structured json).
  • Fetch webpages as markdown by default.
  • Try markdown.new (https://markdown.new/<url>) first, then fall back to local HTML -> markdown/json conversion.
  • Clone GitHub repos from root, tree, and blob URLs with a cached local path.
  • Search cloned repos (or any local folder) with rg (fallback to grep).

Install

pi install npm:pi-web-utils

Or package/publish and install via npm/git like other pi packages.

Tool quick examples

// 1) Search with fallback chain from config
web_search({ query: "TypeScript project architecture patterns" })

// 2) Force a specific engine but still allow fallback
web_search({
  query: "SearXNG self-host setup",
  engineId: "searxng",
  allowFallback: true
})

// 3) Pass per-call query params to engine URL
web_search({
  query: "React suspense",
  engineId: "google",
  extraParams: { hl: "en", num: "10" }
})

// 4) Format raw HTML search response as markdown
web_search({
  query: "site:github.com pi-coding-agent extensions",
  engineId: "duckduckgo",
  rawHtmlFormat: "markdown"
})

// 5) Fetch webpage as markdown (markdown.new first)
fetch_webpage({ url: "https://docs.example.com/guide" })

// 6) Fetch webpage as structured JSON
fetch_webpage({
  url: "https://docs.example.com/guide",
  output: "json"
})

// 7) Clone GitHub repo URL
clone_github_repo({ url: "https://github.com/owner/repo" })

// 8) Clone tree/blob URLs too
clone_github_repo({ url: "https://github.com/owner/repo/tree/main/src" })
clone_github_repo({ url: "https://github.com/owner/repo/blob/main/README.md" })

// 9) Search latest cloned repo
search_local_repo({ query: "registerTool" })

// 10) Search a specific repo key
search_local_repo({
  repo: "owner/repo@main",
  query: "fetchWithTimeout",
  glob: "*.ts"
})

Configuration

Configuration file path (default):

~/.pi/web-tools.json

You can override path with env var:

PI_WEB_TOOLS_CONFIG=/path/to/config.json

Example config

{
  "search": {
    "includeBuiltins": true,
    "engines": [
      {
        "id": "searxng",
        "kind": "searxng",
        "baseUrl": "https://searx.example.com/search",
        "queryParams": {
          "format": "json",
          "language": "en-US"
        },
        "headers": {
          "x-api-key": "optional"
        },
        "timeoutMs": 20000
      },
      {
        "id": "duckduckgo",
        "enabled": true
      },
      {
        "id": "google",
        "queryParams": {
          "hl": "en"
        }
      },
      {
        "id": "my-custom-engine",
        "kind": "custom",
        "baseUrl": "https://search.example.com/query",
        "queryParam": "q",
        "queryParams": {
          "api": "v2"
        },
        "responseFormat": "json"
      }
    ],
    "fallbackOrder": ["searxng", "duckduckgo", "google", "my-custom-engine"],
    "maxResults": 8,
    "timeoutMs": 15000
  },
  "fetch": {
    "timeoutMs": 30000,
    "maxBodyChars": 120000,
    "markdownNew": {
      "enabled": true,
      "baseUrl": "https://markdown.new/",
      "timeoutMs": 20000
    }
  },
  "github": {
    "enabled": true,
    "clonePath": "/tmp/pi-web-utils/repos",
    "cloneTimeoutMs": 30000,
    "maxRepoSizeMB": 350,
    "maxTreeEntries": 200,
    "maxInlineFileChars": 100000
  },
  "localSearch": {
    "defaultMaxMatches": 80,
    "maxMatches": 300,
    "previewChars": 220,
    "timeoutMs": 15000
  }
}

Tool details

web_search

Parameters:

  • query (required)
  • engineId (optional)
  • fallbackOrder (optional)
  • maxResults (optional)
  • extraParams (optional key/value params appended to request)
  • allowFallback (optional, default true)
  • rawHtmlFormat (optional: none | markdown | json)
  • includeRawResponse (optional)

Behavior:

  • Picks an engine from config and tries fallback order on failure/empty parse.
  • Parses JSON result formats when possible.
  • Parses HTML result pages for Google/DDG/generic anchors.
  • Optionally formats raw HTML response via shared webpage formatter.

fetch_webpage

Parameters:

  • url (required)
  • output (markdown | json, default markdown)
  • preferMarkdownNew (default true)
  • maxChars
  • includeRawHtml

Behavior:

  1. Try markdown.new endpoint (https://markdown.new/<url>).
  2. If unavailable/invalid response, fetch directly.
  3. Convert HTML locally with Readability + Turndown (markdown) or DOM extraction (json).

clone_github_repo

Parameters:

  • url (required)
  • forceClone
  • refresh
  • maxTreeEntries

Behavior:

  • Handles GitHub root/tree/blob URLs.
  • Clones to configured local cache path.
  • For large repos (over maxRepoSizeMB), returns API preview unless forceClone: true.
  • For commit-SHA URLs, returns API preview.
  • Returns local path and structured preview for follow-up tooling.

search_local_repo

Parameters:

  • query (required)
  • repo (optional clone key, e.g. owner/repo or owner/repo@branch)
  • path (optional)
  • glob (optional)
  • maxMatches (optional)
  • caseSensitive (optional)

Behavior:

  • Uses rg if available, otherwise grep.
  • Defaults to latest cloned repo if repo/path not provided.
  • Returns file/line/column match list.

Notes

  • This extension executes network requests and local git/shell commands.
  • Only install from trusted sources.
  • Search parsing for public HTML engines can change when provider markup changes; fallback order is important.
  • Google HTML scraping may be rate-limited/challenged depending on region/IP.

Development

bun install
bun run typecheck

Publish to npm

bun run typecheck
bun publish --access public

Before publishing, update these values in package.json:

  • author
  • repository.url
  • homepage
  • bugs.url
  • name (if pi-web-utils is already taken on npm)