pi-search-hub
Unified web search + content extraction extension for pi with 12 backends (DuckDuckGo, Jina AI, Tavily, Brave, Exa, Serper, Firecrawl, Marginalia, LangSearch, WebSearchAPI, Perplexity Sonar, SearXNG). Auto-fallback, RRF combine mode, web_read tool, secure
Package details
Install pi-search-hub from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-search-hub- Package
pi-search-hub- Version
1.4.2- Published
- May 14, 2026
- Downloads
- not available
- Author
- ronnieops.dev
- License
- MIT
- Types
- extension
- Size
- 96.8 KB
- Dependencies
- 1 dependency · 2 peers
Pi manifest JSON
{
"extensions": [
"./extensions/search-hub.ts"
],
"image": "https://pi.dev/assets/packages/pi-search-multi.png"
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-search-hub
Unified web search + content extraction extension for pi with 12 backend providers (all working). One web_search tool, one web_read tool, auto-fallback, RRF-ranked combine mode, and credential resolution via env/shell/literal.
Installation
pi install npm:pi-search-hub
Note for DuckDuckGo backend: Requires the
ddgsPython package. Install with:
- Linux/macOS:
pip3 install ddgs- Windows:
pip install ddgs
Usage
Web Search
After installing, just ask naturally:
Search for recent AI agent frameworks.
What's the latest news on Llama 4?
Or use the tools directly — the agent picks the best configured backend automatically:
web_search— search the web with auto-fallback or parallel combine modeweb_read— fetch any URL as clean markdown
Combine Mode
Set combine=true to query ALL enabled backends in parallel with Reciprocal Rank Fusion (RRF) ranking:
Search for "Rust vs Go performance benchmarks" with combine=true to get results from all backends
Combine mode benefits:
- Broader coverage across multiple search indexes
- Results ranked by RRF — position-based scoring across all backends
- Each result shows which backend found it
- URL deduplication with content-aware merge (prefers richest result)
- Useful for comprehensive research or when you want diverse sources
Tradeoff: Uses more API quota per query (all backends are called), but you get more comprehensive results.
Read Web Pages
Fetch any URL as clean markdown — great for extracting article content, docs, or reference pages:
Read https://docs.example.com/api-reference
The web_read tool supports:
- objective — specific question to focus extraction
- keywords — relevant terms to highlight on long pages
- mode —
rushfor speed (return innerText) orsmart(markdown extraction) - fresh — bypass cache when freshness matters
Supported Backends
| # | Backend | Free Tier | API Key? | How to get key |
|---|---|---|---|---|
| 1 | DuckDuckGo | Unlimited (rate-limited) | No | pip install ddgs (Linux/macOS: pip3) |
| 2 | Jina AI | Free tier (API key req.) | Yes | jina.ai |
| 3 | Marginalia Search | Unlimited (rate-limited) | No† | marginalia.nu |
| 4 | Tavily | 1,000 calls/month | Yes | tavily.com |
| 5 | Serper (Google) | 2,500 queries/month | Yes | serper.dev |
| 6 | Brave | 2,000 queries/month | Yes | brave.com/search/api |
| 7 | Firecrawl | 500 free credits | Yes | firecrawl.dev |
| 8 | Exa | 10 QPS rate-limited | Yes | exa.ai |
| 9 | LangSearch | Genuinely free, no CC | Yes | langsearch.com |
| 10 | WebSearchAPI.ai | 2,000 free credits | Yes | websearchapi.ai |
| 11 | Perplexity Sonar | Unlimited free queries | Yes | perplexity.ai |
| 12 | SearXNG | Self-hosted, unlimited | No | docs.searxng.org |
† Marginalia Search uses
publicas a shared API key — no registration required, but subject to a shared rate limit.Jina AI (s.jina.ai) returns full markdown content. Free tier requires a free API key from jina.ai.
SearXNG is a self-hosted metasearch engine. Run your own instance (or use a public one), no API key required. Configure the instance URL in
.pi/search.json.
Removed: Stract, UnSearch, BoardReader, EntireWeb, Search1API, FreeAPITools.dev — no longer viable (public API removed, requires payment, or endpoint not implemented).
Configuration
Configure backends globally (all projects) or per-project:
Global: ~/.pi/agent/extensions/search.json
Project: .pi/search.json (project takes precedence)
{
"defaultBackend": "auto",
"backends": {
"duckduckgo": { "enabled": true },
"jina": { "enabled": true, "apiKey": "JINA_API_KEY" },
"marginalia": { "enabled": true },
"serper": { "enabled": true, "apiKey": "SERPER_API_KEY" },
"tavily": { "enabled": true, "apiKey": "TAVILY_API_KEY" },
"brave": { "enabled": true, "apiKey": "BRAVE_API_KEY" },
"exa": { "enabled": true, "apiKey": "EXA_API_KEY" },
"firecrawl": { "enabled": true, "apiKey": "FIRECRAWL_API_KEY" },
"langsearch": { "enabled": true, "apiKey": "LANGSEARCH_API_KEY" },
"websearchapi":{ "enabled": true, "apiKey": "WEBSEARCHAPI_API_KEY" },
"perplexity": { "enabled": true, "apiKey": "PERPLEXITY_API_KEY" },
"searxng": { "enabled": true, "instanceUrl": "http://localhost:8888" }
}
}
Credential Resolution
The apiKey field supports four formats (following pi-web-providers convention):
apiKey value |
Resolved from | Example |
|---|---|---|
"SERPER_API_KEY" |
process.env.SERPER_API_KEY |
ALL_CAPS → env var |
"!pass show api/serper" |
stdout of shell command (cached) | ! prefix → exec |
"sk-abc123..." |
Used as-is | Literal key (backwards compatible) |
| (unset) | SEARCH_<BACKEND>_API_KEY env fallback |
Auto-enables backend |
Env var references: Any ALL_CAPS string is treated as an environment variable name (not a literal). If the referenced env var is unset, a warning is printed (your literal key is not silently discarded).
Shell commands: Commands prefixed with ! are executed via execSync with a 5s timeout. Results are cached and invalidated when config is reloaded (editing the config file clears the cache).
Convenience env vars: Backends are auto-enabled when these env vars are set (even with no config entry):
export SEARCH_SERPER_API_KEY="sk-..."
export SEARCH_TAVILY_API_KEY="sk-..."
export SEARCH_EXA_API_KEY="sk-..."
# ...
{
"backends": {
"serper": { "enabled": true, "apiKey": "SERPER_API_KEY" }
}
}
To rotate a shell-command key: Update the secret in your password manager, then trigger a config reload (edit the config file, or wait 10s for automatic refresh).
Or use the interactive setup:
/search-setup
Commands
| Command | Description |
|---|---|
/search-setup |
Interactive prompt to configure API keys for any backend |
/search-status |
Show which backends are active and which have keys |
How auto mode works
Fallback Mode (default, combine=false)
- Tries each enabled backend in order from your config
- If a backend fails (rate limit, auth error, etc.), moves to the next one
- DuckDuckGo requires no API key; Jina AI needs a free API key. Both serve as safety nets
- Returns results from the first backend that succeeds
- If all backends fail, reports the collected errors
Combine Mode (combine=true)
- Queries ALL enabled backends in parallel
- Each backend receives
numResults / numBackendsas a target - Results are merged using Reciprocal Rank Fusion (RRF) — position-based scoring that works across incompatible ranking systems
- Each result shows its source backend (e.g.,
*Source: Tavily*) - URL dedup prefers the result with the richest content (content > snippet)
- Backend statistics are displayed (which succeeded, result counts, errors)
RRF Scoring
RRF assigns each result a score of Σ(1 / (60 + rank_i)) across all backends that returned it. Results are ranked by score, then by number of backends that found them. This means a result ranked #1 by one backend and #5 by another beats a result ranked #4 by two backends.
Security
- API keys are stored in local config files only (
~/.pi/agent/extensions/search.jsonor.pi/search.json), never sent to any third party besides the chosen backend - Env vars and shell commands are supported for credential resolution — the config file is trusted (you own it), but never commit plain API keys to version control
- DuckDuckGo queries use spawned Python subprocess (abortable via signal)
- All HTTP backends have a 30-second timeout; shell commands for credentials have a 5-second timeout
- Error messages are sanitized — API response bodies are truncated and key-like patterns are redacted
- The
.pi/directory is in.gitignore— never commit API keys to version control
Testing
# Run the full benchmark against all backends
node benchmark/benchmark.mjs
# Quick test Jina AI (with your free API key)
curl -s -H "Authorization: Bearer $JINA_API_KEY" "https://s.jina.ai/?q=test&format=json" | jq .
# Quick test via curl with your configured key
curl -X POST "https://api.exa.ai/search" \
-H "Content-Type: application/json" \
-H "x-api-key: $KEY" \
-d '{"query": "test", "numResults": 3, "contents": {"text": true}}'
# Quick test Perplexity Sonar
curl -X POST "https://api.perplexity.ai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $KEY" \
-d '{"model": "sonar", "messages": [{"role": "user", "content": "test"}], "search_context_size": "low"}'
# Quick test SearXNG (replace URL with your instance)
curl "http://localhost:8888/search?q=test&format=json&count=3"
Adding a new backend
Backends are registered via the BACKEND_DEFS registry in extensions/search-hub.ts. Define a search function and add one entry to the registry:
const BACKEND_DEFS: Record<string, BackendRunner> = {
// ... existing entries
mybackend: {
needsKey: true,
needsKeyFromConfig: false,
needsInstanceUrl: false,
label: "My Backend",
setupLabel: "My Backend (free tier description)",
search: async (query, numResults, { key, signal }) => {
const result = await searchMyBackend(query, numResults, key!, signal);
return { results: result.results };
},
},
};
The registry handles dispatching, key resolution, formatting labels, and setup menu — no other edits needed.
License
MIT
