pi-web-tools

Web search via Exa, content extraction, and GitHub repo cloning for Pi coding agent

Package details

← Back

extension

Install pi-web-tools from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-web-tools

Package: pi-web-tools
Version: 0.1.0
Published: Feb 6, 2026
Downloads: 36/mo · 2/wk
Author: coctostan
License: MIT
Types: extension
Size: 113.8 KB
Dependencies: 4 dependencies · 0 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-web-tools

Web search, content extraction, and GitHub repo cloning for the Pi coding agent.

A lightweight extension providing three tools:

web_search — Search the web via Exa with snippet extraction
fetch_content — Fetch any URL and extract clean markdown (HTML via Readability, Jina Reader fallback, GitHub via clone)
get_search_content — Retrieve stored results from previous searches/fetches

Install

pi install npm:pi-web-tools

Or install from git:

pi install github:coctostan/pi-web-tools

Setup

Exa API Key (required for web_search)

Get a key at exa.ai and set it via environment variable:

export EXA_API_KEY="your-key-here"

Or add it to the config file ~/.pi/web-tools.json:

{
  "exaApiKey": "your-key-here"
}

The environment variable takes precedence over the config file.

GitHub CLI (recommended for fetch_content)

For GitHub repo cloning, install the GitHub CLI:

# Debian/Ubuntu
sudo apt install gh

# Or via conda, brew, etc.
gh auth login

Without gh, the extension falls back to git clone (works for public repos).

Configuration

Config file: ~/.pi/web-tools.json (auto-reloaded every 30 seconds)

{
  "exaApiKey": "your-exa-key",
  "github": {
    "maxRepoSizeMB": 350,
    "cloneTimeoutSeconds": 30,
    "clonePath": "/tmp/pi-github-repos"
  }
}

Option	Default	Description
`exaApiKey`	`null`	Exa API key (env `EXA_API_KEY` overrides)
`github.maxRepoSizeMB`	`350`	Skip cloning repos larger than this
`github.cloneTimeoutSeconds`	`30`	Abort clone after this many seconds
`github.clonePath`	`/tmp/pi-github-repos`	Where to store cloned repos

Tools

`web_search`

Search the web using Exa. Returns results with snippets and source URLs.

Parameter	Type	Description
`query`	`string`	Single search query
`queries`	`string[]`	Multiple queries (batch)
`numResults`	`number`	Results per query (default: 5, max: 20)

Example:

Search for "TypeScript 5.8 new features"

`fetch_content`

Fetch URL(s) and extract readable content as markdown.

Parameter	Type	Description
`url`	`string`	Single URL to fetch
`urls`	`string[]`	Multiple URLs (parallel, max 3 concurrent)
`forceClone`	`boolean`	Force cloning large GitHub repos

Content extraction pipeline:

GitHub URLs → Clone repo (shallow, depth 1), generate tree + README
HTML pages → Readability extraction → Markdown conversion
Readability fails → Jina Reader fallback (r.jina.ai)
Non-HTML → Return raw text

Content over 30,000 characters is truncated with a pointer to get_search_content.

`get_search_content`

Retrieve full content from a previous web_search or fetch_content result.

Parameter	Type	Description
`responseId`	`string`	ID from a previous tool result
`query`	`string`	Filter by query text
`queryIndex`	`number`	Filter by query index
`url`	`string`	Filter by URL
`urlIndex`	`number`	Filter by URL index

How GitHub Cloning Works

When fetch_content receives a GitHub URL:

Parse — Extracts owner, repo, ref, path, type (root/blob/tree)
Size check — Queries repo size via gh api. Skips if over threshold (default 350MB)
Clone — Shallow clone (--depth 1) to temp directory, cached for the session
Generate — Based on URL type:
- Root: Full directory tree + README content
- Tree: Directory listing for the specified path
- Blob: File content (with binary detection and 100K truncation)

Non-code GitHub URLs (issues, PRs, discussions, etc.) are fetched as normal web pages.

Architecture

index.ts          — Extension entry point, 3 tools, session management
├── config.ts     — Config with 30s TTL cache, env var overrides
├── storage.ts    — LRU storage (max 50 entries, session restore)
├── exa-search.ts — Exa API client
├── extract.ts    — Readability + Jina Reader content extraction
└── github-extract.ts — GitHub URL parsing, clone, tree/content generation

Development

# Install dependencies
npm install

# Run tests
npx vitest run

# Run tests in watch mode
npx vitest

# Load in pi for testing
pi -e ./index.ts

License

MIT