pi-web-tools
Web search via Exa, content extraction, and GitHub repo cloning for Pi coding agent
Package details
Install pi-web-tools from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-web-tools- Package
pi-web-tools- Version
0.1.0- Published
- Feb 6, 2026
- Downloads
- 36/mo · 2/wk
- Author
- coctostan
- License
- MIT
- Types
- extension
- Size
- 113.8 KB
- Dependencies
- 4 dependencies · 0 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-web-tools
Web search, content extraction, and GitHub repo cloning for the Pi coding agent.
A lightweight extension providing three tools:
web_search— Search the web via Exa with snippet extractionfetch_content— Fetch any URL and extract clean markdown (HTML via Readability, Jina Reader fallback, GitHub via clone)get_search_content— Retrieve stored results from previous searches/fetches
Install
pi install npm:pi-web-tools
Or install from git:
pi install github:coctostan/pi-web-tools
Setup
Exa API Key (required for web_search)
Get a key at exa.ai and set it via environment variable:
export EXA_API_KEY="your-key-here"
Or add it to the config file ~/.pi/web-tools.json:
{
"exaApiKey": "your-key-here"
}
The environment variable takes precedence over the config file.
GitHub CLI (recommended for fetch_content)
For GitHub repo cloning, install the GitHub CLI:
# Debian/Ubuntu
sudo apt install gh
# Or via conda, brew, etc.
gh auth login
Without gh, the extension falls back to git clone (works for public repos).
Configuration
Config file: ~/.pi/web-tools.json (auto-reloaded every 30 seconds)
{
"exaApiKey": "your-exa-key",
"github": {
"maxRepoSizeMB": 350,
"cloneTimeoutSeconds": 30,
"clonePath": "/tmp/pi-github-repos"
}
}
| Option | Default | Description |
|---|---|---|
exaApiKey |
null |
Exa API key (env EXA_API_KEY overrides) |
github.maxRepoSizeMB |
350 |
Skip cloning repos larger than this |
github.cloneTimeoutSeconds |
30 |
Abort clone after this many seconds |
github.clonePath |
/tmp/pi-github-repos |
Where to store cloned repos |
Tools
web_search
Search the web using Exa. Returns results with snippets and source URLs.
| Parameter | Type | Description |
|---|---|---|
query |
string |
Single search query |
queries |
string[] |
Multiple queries (batch) |
numResults |
number |
Results per query (default: 5, max: 20) |
Example:
Search for "TypeScript 5.8 new features"
fetch_content
Fetch URL(s) and extract readable content as markdown.
| Parameter | Type | Description |
|---|---|---|
url |
string |
Single URL to fetch |
urls |
string[] |
Multiple URLs (parallel, max 3 concurrent) |
forceClone |
boolean |
Force cloning large GitHub repos |
Content extraction pipeline:
- GitHub URLs → Clone repo (shallow, depth 1), generate tree + README
- HTML pages → Readability extraction → Markdown conversion
- Readability fails → Jina Reader fallback (
r.jina.ai) - Non-HTML → Return raw text
Content over 30,000 characters is truncated with a pointer to get_search_content.
get_search_content
Retrieve full content from a previous web_search or fetch_content result.
| Parameter | Type | Description |
|---|---|---|
responseId |
string |
ID from a previous tool result |
query |
string |
Filter by query text |
queryIndex |
number |
Filter by query index |
url |
string |
Filter by URL |
urlIndex |
number |
Filter by URL index |
How GitHub Cloning Works
When fetch_content receives a GitHub URL:
- Parse — Extracts owner, repo, ref, path, type (root/blob/tree)
- Size check — Queries repo size via
gh api. Skips if over threshold (default 350MB) - Clone — Shallow clone (
--depth 1) to temp directory, cached for the session - Generate — Based on URL type:
- Root: Full directory tree + README content
- Tree: Directory listing for the specified path
- Blob: File content (with binary detection and 100K truncation)
Non-code GitHub URLs (issues, PRs, discussions, etc.) are fetched as normal web pages.
Architecture
index.ts — Extension entry point, 3 tools, session management
├── config.ts — Config with 30s TTL cache, env var overrides
├── storage.ts — LRU storage (max 50 entries, session restore)
├── exa-search.ts — Exa API client
├── extract.ts — Readability + Jina Reader content extraction
└── github-extract.ts — GitHub URL parsing, clone, tree/content generation
Development
# Install dependencies
npm install
# Run tests
npx vitest run
# Run tests in watch mode
npx vitest
# Load in pi for testing
pi -e ./index.ts
License
MIT