pi-web-utils
Configurable web search, markdown-first webpage fetching, GitHub local repo search tools for pi coding agent
Package details
Install pi-web-utils from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-web-utils- Package
pi-web-utils- Version
0.1.1- Published
- Feb 25, 2026
- Downloads
- 129/mo · 7/wk
- Author
- shantanugoel
- License
- MIT
- Types
- extension
- Size
- 69 KB
- Dependencies
- 3 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-web-utils
Configurable web tooling extension for pi-coding-agent.
It adds four tools:
web_searchfetch_webpageclone_github_reposearch_local_repo
What it does
- Search with configurable engines (
google,duckduckgo,searxng, or custom) and ordered fallback. - Append engine-specific query params and headers from config and per-call overrides.
- If a search engine returns HTML, optionally convert that raw HTML with the same formatter used by webpage fetch (
markdownor structuredjson). - Fetch webpages as markdown by default.
- Try
markdown.new(https://markdown.new/<url>) first, then fall back to local HTML -> markdown/json conversion. - Clone GitHub repos from
root,tree, andblobURLs with a cached local path. - Search cloned repos (or any local folder) with
rg(fallback togrep).
Install
pi install npm:pi-web-utils
Or package/publish and install via npm/git like other pi packages.
Tool quick examples
// 1) Search with fallback chain from config
web_search({ query: "TypeScript project architecture patterns" })
// 2) Force a specific engine but still allow fallback
web_search({
query: "SearXNG self-host setup",
engineId: "searxng",
allowFallback: true
})
// 3) Pass per-call query params to engine URL
web_search({
query: "React suspense",
engineId: "google",
extraParams: { hl: "en", num: "10" }
})
// 4) Format raw HTML search response as markdown
web_search({
query: "site:github.com pi-coding-agent extensions",
engineId: "duckduckgo",
rawHtmlFormat: "markdown"
})
// 5) Fetch webpage as markdown (markdown.new first)
fetch_webpage({ url: "https://docs.example.com/guide" })
// 6) Fetch webpage as structured JSON
fetch_webpage({
url: "https://docs.example.com/guide",
output: "json"
})
// 7) Clone GitHub repo URL
clone_github_repo({ url: "https://github.com/owner/repo" })
// 8) Clone tree/blob URLs too
clone_github_repo({ url: "https://github.com/owner/repo/tree/main/src" })
clone_github_repo({ url: "https://github.com/owner/repo/blob/main/README.md" })
// 9) Search latest cloned repo
search_local_repo({ query: "registerTool" })
// 10) Search a specific repo key
search_local_repo({
repo: "owner/repo@main",
query: "fetchWithTimeout",
glob: "*.ts"
})
Configuration
Configuration file path (default):
~/.pi/web-tools.json
You can override path with env var:
PI_WEB_TOOLS_CONFIG=/path/to/config.json
Example config
{
"search": {
"includeBuiltins": true,
"engines": [
{
"id": "searxng",
"kind": "searxng",
"baseUrl": "https://searx.example.com/search",
"queryParams": {
"format": "json",
"language": "en-US"
},
"headers": {
"x-api-key": "optional"
},
"timeoutMs": 20000
},
{
"id": "duckduckgo",
"enabled": true
},
{
"id": "google",
"queryParams": {
"hl": "en"
}
},
{
"id": "my-custom-engine",
"kind": "custom",
"baseUrl": "https://search.example.com/query",
"queryParam": "q",
"queryParams": {
"api": "v2"
},
"responseFormat": "json"
}
],
"fallbackOrder": ["searxng", "duckduckgo", "google", "my-custom-engine"],
"maxResults": 8,
"timeoutMs": 15000
},
"fetch": {
"timeoutMs": 30000,
"maxBodyChars": 120000,
"markdownNew": {
"enabled": true,
"baseUrl": "https://markdown.new/",
"timeoutMs": 20000
}
},
"github": {
"enabled": true,
"clonePath": "/tmp/pi-web-utils/repos",
"cloneTimeoutMs": 30000,
"maxRepoSizeMB": 350,
"maxTreeEntries": 200,
"maxInlineFileChars": 100000
},
"localSearch": {
"defaultMaxMatches": 80,
"maxMatches": 300,
"previewChars": 220,
"timeoutMs": 15000
}
}
Tool details
web_search
Parameters:
query(required)engineId(optional)fallbackOrder(optional)maxResults(optional)extraParams(optional key/value params appended to request)allowFallback(optional, defaulttrue)rawHtmlFormat(optional:none|markdown|json)includeRawResponse(optional)
Behavior:
- Picks an engine from config and tries fallback order on failure/empty parse.
- Parses JSON result formats when possible.
- Parses HTML result pages for Google/DDG/generic anchors.
- Optionally formats raw HTML response via shared webpage formatter.
fetch_webpage
Parameters:
url(required)output(markdown|json, defaultmarkdown)preferMarkdownNew(defaulttrue)maxCharsincludeRawHtml
Behavior:
- Try
markdown.newendpoint (https://markdown.new/<url>). - If unavailable/invalid response, fetch directly.
- Convert HTML locally with Readability + Turndown (markdown) or DOM extraction (json).
clone_github_repo
Parameters:
url(required)forceClonerefreshmaxTreeEntries
Behavior:
- Handles GitHub root/tree/blob URLs.
- Clones to configured local cache path.
- For large repos (over
maxRepoSizeMB), returns API preview unlessforceClone: true. - For commit-SHA URLs, returns API preview.
- Returns local path and structured preview for follow-up tooling.
search_local_repo
Parameters:
query(required)repo(optional clone key, e.g.owner/repoorowner/repo@branch)path(optional)glob(optional)maxMatches(optional)caseSensitive(optional)
Behavior:
- Uses
rgif available, otherwisegrep. - Defaults to latest cloned repo if
repo/pathnot provided. - Returns file/line/column match list.
Notes
- This extension executes network requests and local git/shell commands.
- Only install from trusted sources.
- Search parsing for public HTML engines can change when provider markup changes; fallback order is important.
- Google HTML scraping may be rate-limited/challenged depending on region/IP.
Development
bun install
bun run typecheck
Publish to npm
bun run typecheck
bun publish --access public
Before publishing, update these values in package.json:
authorrepository.urlhomepagebugs.urlname(ifpi-web-utilsis already taken on npm)