@coctostan/pi-exa-gh-web-tools

Web search via Exa, content extraction, and GitHub repo cloning for Pi coding agent

Package details

← Back

extension

Install @coctostan/pi-exa-gh-web-tools from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@coctostan/pi-exa-gh-web-tools

Package: @coctostan/pi-exa-gh-web-tools
Version: 3.0.0
Published: Mar 25, 2026
Downloads: 73/mo · 20/wk
Author: coctostan
License: MIT
Types: extension
Size: 254.6 KB
Dependencies: 5 dependencies · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

@coctostan/pi-exa-gh-web-tools

Web search, code search, content extraction, and GitHub repo cloning for the Pi coding agent, powered by Exa.

This package gives Pi four tools:

web_search — search the web and return compact results
code_search — find code examples from docs, GitHub, and Stack Overflow
fetch_content — fetch a URL, GitHub repo/file, or PDF and extract readable content
get_search_content — retrieve stored content from an earlier tool call

Why this exists

Most web pages are too large and noisy to drop directly into an agent's context window. This extension is designed to keep Pi focused:

web_search returns short summaries by default
fetch_content can answer a specific question instead of returning a whole page
raw fetched content is written to a temp file instead of flooding context
previous results are stored and can be retrieved later

If you're new to Pi, the simplest mental model is:

Search for a good source
Fetch only the page you need
Ask a focused question when possible
Read the saved file only if you need the raw content

Quick start

1) Install the extension in Pi

From npm:

pi install npm:@coctostan/pi-exa-gh-web-tools

Or directly from GitHub:

pi install github:coctostan/pi-web-tools

2) Configure your Exa API key

web_search and code_search require an Exa API key.

Set it as an environment variable:

export EXA_API_KEY="your-key-here"

Or put it in ~/.pi/web-tools.json:

{
  "exaApiKey": "your-key-here"
}

Environment variables take precedence over the config file.

3) Start using the tools

Typical beginner flow:

web_search({ query: "vitest mock fetch" })
fetch_content({
  url: "https://vitest.dev/guide/mocking.html",
  prompt: "How do I mock a function in Vitest?"
})

Standalone CLI

The package also ships a standalone exa-tools binary that works outside of Pi.

Install globally

npm install -g @coctostan/pi-exa-gh-web-tools

Set your API key

search and code commands require an Exa API key:

export EXA_API_KEY="your-key-here"

Commands

Web search:

exa-tools search "vitest mock fetch" --n 3

Code search:

exa-tools code "vitest mock fetch" --tokens 800

Fetch a page (raw markdown):

exa-tools fetch "https://vitest.dev/guide/mocking.html"

Fetch with a focused question:

exa-tools fetch "https://vitest.dev/guide/mocking.html" --prompt "How do I mock a function?"

Output behavior

Successful output goes to stdout
Errors and warnings go to stderr
When --prompt is used but no filter model is available, the CLI prints a warning to stderr and falls back to raw markdown on stdout

30-second example

If you've never used Pi tools before, this is the shortest useful workflow:

// 1) Find a good source
web_search({ query: "vitest retry failed test" })

// 2) Ask one page a focused question
fetch_content({
  url: "https://vitest.dev/guide/",
  prompt: "How do I retry a failed test?"
})

Rule of thumb:

use web_search to choose a source
use fetch_content({ prompt }) to get an answer
use fetch_content({ url }) without prompt only when you really need the raw page

What each tool does

Which tool should I use?

If you want to...	Use this
Find a relevant page or article	`web_search`
Find a working code snippet	`code_search`
Ask one URL a specific question	`fetch_content({ url, prompt })`
Read the full raw content of a page	`fetch_content({ url })`
Re-open an earlier result without refetching	`get_search_content`

For most Pi sessions, this is the best default path:

web_search
fetch_content({ prompt })
get_search_content or read only if you need more detail

`web_search`

Search the web and return 1-line summaries by default.

Use it when you want to decide which URL is worth reading next.

Parameters

Parameter	Type	Description
`query`	`string`	Single search query
`queries`	`string[]`	Multiple search queries
`numResults`	`number`	Results per query, default `5`, max `20`
`type`	`string`	`"auto"` (default), `"instant"`, or `"deep"`
`detail`	`string`	`"summary"` (default) or `"highlights"`
`freshness`	`string`	`"realtime"`, `"day"`, `"week"`, or `"any"`
`category`	`string`	Content category filter
`includeDomains`	`string[]`	Only include these domains
`excludeDomains`	`string[]`	Exclude these domains
`similarUrl`	`string`	Find pages similar to a URL

Examples

// Basic search
web_search({ query: "vitest snapshot testing" })

// Get more detail before fetching
web_search({ query: "rust async runtime comparison", detail: "highlights" })

// Restrict results to specific sites
web_search({ query: "useEffect cleanup", includeDomains: ["react.dev", "github.com"] })

// Batch search
web_search({ queries: ["vitest mocking", "vitest coverage", "vitest browser mode"] })

// Find related pages
web_search({ similarUrl: "https://vitest.dev/guide/" })

Smart search behavior

The tool automatically improves certain queries before sending them to Exa:

stack traces and error messages switch to keyword search
short vague coding queries may expand to include docs example
duplicate URLs are removed
snippet noise like breadcrumbs and tracking params is cleaned up

`fetch_content`

Fetch a page, GitHub repo/file, or PDF and return readable content.

Use it when you already know which source you want to inspect.

Parameters

Parameter	Type	Description
`url`	`string`	Single URL to fetch
`urls`	`string[]`	Multiple URLs to fetch
`prompt`	`string`	Ask a question about the content instead of returning the whole page
`forceClone`	`boolean`	Force clone for large GitHub repos
`noCache`	`boolean`	Skip research cache and fetch fresh (still updates cache)

Best practice for Pi beginners

Prefer prompt whenever you can.

fetch_content({
  url: "https://vitest.dev/guide/",
  prompt: "How do I run only one test file?"
})

That returns a focused answer instead of dumping a large page into context.

Raw fetch behavior

Without prompt, content is written to a temp file and the tool returns:

a short preview
the temp file path
the total content size

This keeps Pi's context smaller while preserving access to the full content.

GitHub support

GitHub URLs are detected automatically.

// Repo tree + README summary
fetch_content({ url: "https://github.com/facebook/react" })

// Specific file
fetch_content({ url: "https://github.com/facebook/react/blob/main/packages/react/src/React.js" })

The tool tries gh repo clone first, then falls back to git clone.

PDF support

fetch_content({ url: "https://arxiv.org/pdf/2312.00752" })

PDF text is extracted with pdf-parse. Corrupt, encrypted, empty, or oversized PDFs return a clear error.

`code_search`

Search for working code examples from docs, GitHub repositories, and Stack Overflow.

Use it when you want code patterns, not general web pages.

Parameters

Parameter	Type	Description
`query`	`string`	Describe what code you want
`tokensNum`	`number`	Response size in tokens

Examples

code_search({ query: "vitest mock fetch with MSW" })
code_search({ query: "React Server Components with Next.js app router", tokensNum: 5000 })

`get_search_content`

Retrieve stored content from an earlier web_search, fetch_content, or code_search call.

This is useful when you want to revisit a result without repeating the network request.

Parameters

Parameter	Type	Description
`responseId`	`string`	ID returned by an earlier tool call
`query`	`string`	Retrieve a `web_search` result by query
`queryIndex`	`number`	Retrieve a `web_search` result by position
`url`	`string`	Retrieve a `fetch_content` result by URL
`urlIndex`	`number`	Retrieve a `fetch_content` result by position
`maxChars`	`number`	Maximum response size, default `30000`, max `100000`

Examples

get_search_content({ responseId: "abc123", queryIndex: 0 })
get_search_content({ responseId: "xyz789", url: "https://vitest.dev/api/" })

Configuration

The package reads config from ~/.pi/web-tools.json and hot-reloads it every 30 seconds.

Full config example

{ "exaApiKey": "your-exa-key", "filterModel": "anthropic/claude-haiku-4-5", "cacheTTLMinutes": 1440, "github": { "maxRepoSizeMB": 350, "cloneTimeoutSeconds": 30, "clonePath": "/tmp/pi-github-repos" }, "tools": { "web_search": true, "code_search": true, "fetch_content": true, "get_search_content": true } }


### Config options

| Setting | Description |
|---------|-------------|
| `exaApiKey` | Exa API key used by `web_search` and `code_search` |
| `filterModel` | Cheap model used by `fetch_content({ prompt })` |
| `github.maxRepoSizeMB` | Max GitHub repo size before refusing or requiring force clone |
| `github.cloneTimeoutSeconds` | Clone timeout |
| `github.clonePath` | Cache directory for cloned repos |
| `tools.*` | Enable or disable individual tools |
| `cacheTTLMinutes` | TTL in minutes for the persistent research cache (default: `1440` = 24h) |

To use a different config path:

```bash
export PI_WEB_TOOLS_CONFIG="$HOME/.pi/web-tools.json"

How this package protects context

This package is opinionated about token efficiency.

1. Summary-first search

web_search returns short summaries by default so the main model only sees enough to choose a source.

2. Question-guided fetching

fetch_content({ prompt }) lets a cheaper model read the full page and return only the answer to your question.

3. File-first raw content

Raw fetched content is offloaded to a temp file instead of being pasted inline.

4. Stored results

Search and fetch results stay available for the session through get_search_content.

Network resilience

All Exa API requests use retry logic for transient failures.

retries: max 2
backoff: 1s -> 2s
retried: 429, 500, 502, 503, 504, and network errors
not retried: 400, 401, 403, 404, and abort signals

The package also:

deduplicates repeated URL fetches within a session
runs multi-URL fetches with p-limit(3)
runs batch web searches with p-limit(3)

Development

Clone the repo:

git clone git@github.com:coctostan/pi-web-tools.git
cd pi-web-tools
npm install

Run tests:

npm test

Watch tests while developing:

npm run test:watch

Load the extension in Pi for manual testing:

pi -e ./index.ts

Tests use mocked network calls, so they do not require an Exa API key.

Troubleshooting

`web_search` or `code_search` fails immediately

Usually this means your Exa API key is missing or invalid.

Check:

echo "$EXA_API_KEY"

Or verify ~/.pi/web-tools.json contains:

{
  "exaApiKey": "your-key-here"
}

`fetch_content` returned a file path instead of an answer

That is expected when you do not provide prompt, or when no cheap filter model is available.

Use:

fetch_content({
  url: "https://example.com",
  prompt: "What does this page say about X?"
})

I got too much text back

Try this order:

use web_search first
use fetch_content({ prompt }) instead of raw fetch
only read the saved temp file if you need the original page

GitHub fetches are slow or fail on large repos

Try:

fetch_content({
  url: "https://github.com/owner/repo",
  forceClone: true
})

Also make sure gh or git is available on your machine.

Maintainer release checklist

The repo is currently at package version 2.0.0. If npm still shows an older version, use this checklist before publishing:

npm test
npm pack --dry-run
npm publish --access public

Before publishing, confirm:

package.json version is correct
repository, homepage, and bugs URLs point to the live repo
README.md reflects the current feature set
the dry-run tarball only contains the intended files

Current package metadata points to this repo:

Repository: https://github.com/coctostan/pi-web-tools
Issues: https://github.com/coctostan/pi-web-tools/issues
README/Homepage: https://github.com/coctostan/pi-web-tools#readme

Project structure

index.ts           Pi extension entry point and tool registration
exa-search.ts      Exa web search integration
exa-context.ts     Exa code/context search integration
extract.ts         HTML/PDF content extraction
github-extract.ts  GitHub repo and file handling
filter.ts          Cheap-model filtering for focused answers
research-cache.ts  Persistent TTL-based research cache
storage.ts         Session result storage
config.ts          Config loading and hot reload
tool-params.ts     Tool input normalization and validation
retry.ts           Retry and backoff helpers
offload.ts         Temp-file offload for raw content
smart-search.ts    Query enhancement and deduplication
truncation.ts      Response truncation helpers
constants.ts       Shared constants (timeouts, TTLs)

Changelog

3.0.0

fetch_content gained persistent research cache — repeated prompt+URL lookups return instant cached answers
fetch_content gained noCache param to bypass cache
cacheTTLMinutes config option (default 24h)
details.ptcValue on all 4 tools for PTC interop
multi-URL+prompt ptcValue shape cleaned up

2.0.0

fetch_content gained prompt for focused question answering
web_search now returns summary-first results by default
raw fetches are always offloaded to temp files
web_search gained freshness, similarUrl, and detail
smart query enhancement and result deduplication were added
retry logic, URL caching, and parallel batch processing were improved

1.2.0

PDF extraction in fetch_content
get_search_content.maxChars
dynamic file offloading for large content

1.1.0

initial release of web_search, code_search, fetch_content, and get_search_content

License

This project is licensed under the MIT License. See LICENSE.

@coctostan/pi-exa-gh-web-tools

Why this exists

Quick start

1) Install the extension in Pi

2) Configure your Exa API key

3) Start using the tools

Standalone CLI

Install globally

Set your API key

Commands

Output behavior

30-second example

What each tool does

Which tool should I use?

web_search

Parameters

Examples

Smart search behavior

fetch_content

Parameters

Best practice for Pi beginners

Raw fetch behavior

GitHub support

PDF support

code_search

Parameters

Examples

get_search_content

Parameters

Examples

Configuration

Full config example

How this package protects context

1. Summary-first search

2. Question-guided fetching

3. File-first raw content

4. Stored results

Network resilience

Development

Troubleshooting

web_search or code_search fails immediately

fetch_content returned a file path instead of an answer

I got too much text back

GitHub fetches are slow or fail on large repos

Maintainer release checklist

Project structure

Changelog

3.0.0

2.0.0

1.2.0

1.1.0

License

`web_search`

`fetch_content`

`code_search`

`get_search_content`

`web_search` or `code_search` fails immediately

`fetch_content` returned a file path instead of an answer