@coctostan/pi-exa-gh-web-tools
Web search via Exa, content extraction, and GitHub repo cloning for Pi coding agent
Package details
Install @coctostan/pi-exa-gh-web-tools from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@coctostan/pi-exa-gh-web-tools- Package
@coctostan/pi-exa-gh-web-tools- Version
3.0.0- Published
- Mar 25, 2026
- Downloads
- 73/mo · 20/wk
- Author
- coctostan
- License
- MIT
- Types
- extension
- Size
- 254.6 KB
- Dependencies
- 5 dependencies · 3 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
@coctostan/pi-exa-gh-web-tools
Web search, code search, content extraction, and GitHub repo cloning for the Pi coding agent, powered by Exa.
This package gives Pi four tools:
web_search— search the web and return compact resultscode_search— find code examples from docs, GitHub, and Stack Overflowfetch_content— fetch a URL, GitHub repo/file, or PDF and extract readable contentget_search_content— retrieve stored content from an earlier tool call
Why this exists
Most web pages are too large and noisy to drop directly into an agent's context window. This extension is designed to keep Pi focused:
web_searchreturns short summaries by defaultfetch_contentcan answer a specific question instead of returning a whole page- raw fetched content is written to a temp file instead of flooding context
- previous results are stored and can be retrieved later
If you're new to Pi, the simplest mental model is:
- Search for a good source
- Fetch only the page you need
- Ask a focused question when possible
- Read the saved file only if you need the raw content
Quick start
1) Install the extension in Pi
From npm:
pi install npm:@coctostan/pi-exa-gh-web-tools
Or directly from GitHub:
pi install github:coctostan/pi-web-tools
2) Configure your Exa API key
web_search and code_search require an Exa API key.
Set it as an environment variable:
export EXA_API_KEY="your-key-here"
Or put it in ~/.pi/web-tools.json:
{
"exaApiKey": "your-key-here"
}
Environment variables take precedence over the config file.
3) Start using the tools
Typical beginner flow:
web_search({ query: "vitest mock fetch" })
fetch_content({
url: "https://vitest.dev/guide/mocking.html",
prompt: "How do I mock a function in Vitest?"
})
Standalone CLI
The package also ships a standalone exa-tools binary that works outside of Pi.
Install globally
npm install -g @coctostan/pi-exa-gh-web-tools
Set your API key
search and code commands require an Exa API key:
export EXA_API_KEY="your-key-here"
Commands
Web search:
exa-tools search "vitest mock fetch" --n 3
Code search:
exa-tools code "vitest mock fetch" --tokens 800
Fetch a page (raw markdown):
exa-tools fetch "https://vitest.dev/guide/mocking.html"
Fetch with a focused question:
exa-tools fetch "https://vitest.dev/guide/mocking.html" --prompt "How do I mock a function?"
Output behavior
- Successful output goes to stdout
- Errors and warnings go to stderr
- When
--promptis used but no filter model is available, the CLI prints a warning to stderr and falls back to raw markdown on stdout
30-second example
If you've never used Pi tools before, this is the shortest useful workflow:
// 1) Find a good source
web_search({ query: "vitest retry failed test" })
// 2) Ask one page a focused question
fetch_content({
url: "https://vitest.dev/guide/",
prompt: "How do I retry a failed test?"
})
Rule of thumb:
- use
web_searchto choose a source - use
fetch_content({ prompt })to get an answer - use
fetch_content({ url })withoutpromptonly when you really need the raw page
What each tool does
Which tool should I use?
| If you want to... | Use this |
|---|---|
| Find a relevant page or article | web_search |
| Find a working code snippet | code_search |
| Ask one URL a specific question | fetch_content({ url, prompt }) |
| Read the full raw content of a page | fetch_content({ url }) |
| Re-open an earlier result without refetching | get_search_content |
For most Pi sessions, this is the best default path:
web_searchfetch_content({ prompt })get_search_contentorreadonly if you need more detail
web_search
Search the web and return 1-line summaries by default.
Use it when you want to decide which URL is worth reading next.
Parameters
| Parameter | Type | Description |
|---|---|---|
query |
string |
Single search query |
queries |
string[] |
Multiple search queries |
numResults |
number |
Results per query, default 5, max 20 |
type |
string |
"auto" (default), "instant", or "deep" |
detail |
string |
"summary" (default) or "highlights" |
freshness |
string |
"realtime", "day", "week", or "any" |
category |
string |
Content category filter |
includeDomains |
string[] |
Only include these domains |
excludeDomains |
string[] |
Exclude these domains |
similarUrl |
string |
Find pages similar to a URL |
Examples
// Basic search
web_search({ query: "vitest snapshot testing" })
// Get more detail before fetching
web_search({ query: "rust async runtime comparison", detail: "highlights" })
// Restrict results to specific sites
web_search({ query: "useEffect cleanup", includeDomains: ["react.dev", "github.com"] })
// Batch search
web_search({ queries: ["vitest mocking", "vitest coverage", "vitest browser mode"] })
// Find related pages
web_search({ similarUrl: "https://vitest.dev/guide/" })
Smart search behavior
The tool automatically improves certain queries before sending them to Exa:
- stack traces and error messages switch to keyword search
- short vague coding queries may expand to include
docs example - duplicate URLs are removed
- snippet noise like breadcrumbs and tracking params is cleaned up
fetch_content
Fetch a page, GitHub repo/file, or PDF and return readable content.
Use it when you already know which source you want to inspect.
Parameters
| Parameter | Type | Description |
|---|---|---|
url |
string |
Single URL to fetch |
urls |
string[] |
Multiple URLs to fetch |
prompt |
string |
Ask a question about the content instead of returning the whole page |
forceClone |
boolean |
Force clone for large GitHub repos |
noCache |
boolean |
Skip research cache and fetch fresh (still updates cache) |
Best practice for Pi beginners
Prefer prompt whenever you can.
fetch_content({
url: "https://vitest.dev/guide/",
prompt: "How do I run only one test file?"
})
That returns a focused answer instead of dumping a large page into context.
Raw fetch behavior
Without prompt, content is written to a temp file and the tool returns:
- a short preview
- the temp file path
- the total content size
This keeps Pi's context smaller while preserving access to the full content.
GitHub support
GitHub URLs are detected automatically.
// Repo tree + README summary
fetch_content({ url: "https://github.com/facebook/react" })
// Specific file
fetch_content({ url: "https://github.com/facebook/react/blob/main/packages/react/src/React.js" })
The tool tries gh repo clone first, then falls back to git clone.
PDF support
fetch_content({ url: "https://arxiv.org/pdf/2312.00752" })
PDF text is extracted with pdf-parse. Corrupt, encrypted, empty, or oversized PDFs return a clear error.
code_search
Search for working code examples from docs, GitHub repositories, and Stack Overflow.
Use it when you want code patterns, not general web pages.
Parameters
| Parameter | Type | Description |
|---|---|---|
query |
string |
Describe what code you want |
tokensNum |
number |
Response size in tokens |
Examples
code_search({ query: "vitest mock fetch with MSW" })
code_search({ query: "React Server Components with Next.js app router", tokensNum: 5000 })
get_search_content
Retrieve stored content from an earlier web_search, fetch_content, or code_search call.
This is useful when you want to revisit a result without repeating the network request.
Parameters
| Parameter | Type | Description |
|---|---|---|
responseId |
string |
ID returned by an earlier tool call |
query |
string |
Retrieve a web_search result by query |
queryIndex |
number |
Retrieve a web_search result by position |
url |
string |
Retrieve a fetch_content result by URL |
urlIndex |
number |
Retrieve a fetch_content result by position |
maxChars |
number |
Maximum response size, default 30000, max 100000 |
Examples
get_search_content({ responseId: "abc123", queryIndex: 0 })
get_search_content({ responseId: "xyz789", url: "https://vitest.dev/api/" })
Configuration
The package reads config from ~/.pi/web-tools.json and hot-reloads it every 30 seconds.
Full config example
{ "exaApiKey": "your-exa-key", "filterModel": "anthropic/claude-haiku-4-5", "cacheTTLMinutes": 1440, "github": { "maxRepoSizeMB": 350, "cloneTimeoutSeconds": 30, "clonePath": "/tmp/pi-github-repos" }, "tools": { "web_search": true, "code_search": true, "fetch_content": true, "get_search_content": true } }
### Config options
| Setting | Description |
|---------|-------------|
| `exaApiKey` | Exa API key used by `web_search` and `code_search` |
| `filterModel` | Cheap model used by `fetch_content({ prompt })` |
| `github.maxRepoSizeMB` | Max GitHub repo size before refusing or requiring force clone |
| `github.cloneTimeoutSeconds` | Clone timeout |
| `github.clonePath` | Cache directory for cloned repos |
| `tools.*` | Enable or disable individual tools |
| `cacheTTLMinutes` | TTL in minutes for the persistent research cache (default: `1440` = 24h) |
To use a different config path:
```bash
export PI_WEB_TOOLS_CONFIG="$HOME/.pi/web-tools.json"
How this package protects context
This package is opinionated about token efficiency.
1. Summary-first search
web_search returns short summaries by default so the main model only sees enough to choose a source.
2. Question-guided fetching
fetch_content({ prompt }) lets a cheaper model read the full page and return only the answer to your question.
3. File-first raw content
Raw fetched content is offloaded to a temp file instead of being pasted inline.
4. Stored results
Search and fetch results stay available for the session through get_search_content.
Network resilience
All Exa API requests use retry logic for transient failures.
- retries: max 2
- backoff:
1s -> 2s - retried:
429,500,502,503,504, and network errors - not retried:
400,401,403,404, and abort signals
The package also:
- deduplicates repeated URL fetches within a session
- runs multi-URL fetches with
p-limit(3) - runs batch web searches with
p-limit(3)
Development
Clone the repo:
git clone git@github.com:coctostan/pi-web-tools.git
cd pi-web-tools
npm install
Run tests:
npm test
Watch tests while developing:
npm run test:watch
Load the extension in Pi for manual testing:
pi -e ./index.ts
Tests use mocked network calls, so they do not require an Exa API key.
Troubleshooting
web_search or code_search fails immediately
Usually this means your Exa API key is missing or invalid.
Check:
echo "$EXA_API_KEY"
Or verify ~/.pi/web-tools.json contains:
{
"exaApiKey": "your-key-here"
}
fetch_content returned a file path instead of an answer
That is expected when you do not provide prompt, or when no cheap filter model is available.
Use:
fetch_content({
url: "https://example.com",
prompt: "What does this page say about X?"
})
I got too much text back
Try this order:
- use
web_searchfirst - use
fetch_content({ prompt })instead of raw fetch - only read the saved temp file if you need the original page
GitHub fetches are slow or fail on large repos
Try:
fetch_content({
url: "https://github.com/owner/repo",
forceClone: true
})
Also make sure gh or git is available on your machine.
Maintainer release checklist
The repo is currently at package version 2.0.0. If npm still shows an older version, use this checklist before publishing:
npm test
npm pack --dry-run
npm publish --access public
Before publishing, confirm:
package.jsonversion is correct- repository, homepage, and bugs URLs point to the live repo
README.mdreflects the current feature set- the dry-run tarball only contains the intended files
Current package metadata points to this repo:
- Repository:
https://github.com/coctostan/pi-web-tools - Issues:
https://github.com/coctostan/pi-web-tools/issues - README/Homepage:
https://github.com/coctostan/pi-web-tools#readme
Project structure
index.ts Pi extension entry point and tool registration
exa-search.ts Exa web search integration
exa-context.ts Exa code/context search integration
extract.ts HTML/PDF content extraction
github-extract.ts GitHub repo and file handling
filter.ts Cheap-model filtering for focused answers
research-cache.ts Persistent TTL-based research cache
storage.ts Session result storage
config.ts Config loading and hot reload
tool-params.ts Tool input normalization and validation
retry.ts Retry and backoff helpers
offload.ts Temp-file offload for raw content
smart-search.ts Query enhancement and deduplication
truncation.ts Response truncation helpers
constants.ts Shared constants (timeouts, TTLs)
Changelog
3.0.0
fetch_contentgained persistent research cache — repeated prompt+URL lookups return instant cached answersfetch_contentgainednoCacheparam to bypass cachecacheTTLMinutesconfig option (default 24h)details.ptcValueon all 4 tools for PTC interop- multi-URL+prompt ptcValue shape cleaned up
2.0.0
fetch_contentgainedpromptfor focused question answeringweb_searchnow returns summary-first results by default- raw fetches are always offloaded to temp files
web_searchgainedfreshness,similarUrl, anddetail- smart query enhancement and result deduplication were added
- retry logic, URL caching, and parallel batch processing were improved
1.2.0
- PDF extraction in
fetch_content get_search_content.maxChars- dynamic file offloading for large content
1.1.0
- initial release of
web_search,code_search,fetch_content, andget_search_content
License
This project is licensed under the MIT License. See LICENSE.