pi-search-hub

Unified web search + content extraction extension for pi with 12 backends (DuckDuckGo, Jina AI, Tavily, Brave, Exa, Serper, Firecrawl, Marginalia, LangSearch, WebSearchAPI, Perplexity Sonar, SearXNG). Auto-fallback, RRF combine mode, web_read tool, secure

Packages

Package details

extension

Install pi-search-hub from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-search-hub
Package
pi-search-hub
Version
1.4.2
Published
May 14, 2026
Downloads
not available
Author
ronnieops.dev
License
MIT
Types
extension
Size
96.8 KB
Dependencies
1 dependency · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions/search-hub.ts"
  ],
  "image": "https://pi.dev/assets/packages/pi-search-multi.png"
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-search-hub

Unified web search + content extraction extension for pi with 12 backend providers (all working). One web_search tool, one web_read tool, auto-fallback, RRF-ranked combine mode, and credential resolution via env/shell/literal.

Installation

pi install npm:pi-search-hub

Note for DuckDuckGo backend: Requires the ddgs Python package. Install with:

  • Linux/macOS: pip3 install ddgs
  • Windows: pip install ddgs

Usage

Web Search

After installing, just ask naturally:

Search for recent AI agent frameworks.
What's the latest news on Llama 4?

Or use the tools directly — the agent picks the best configured backend automatically:

  • web_search — search the web with auto-fallback or parallel combine mode
  • web_read — fetch any URL as clean markdown

Combine Mode

Set combine=true to query ALL enabled backends in parallel with Reciprocal Rank Fusion (RRF) ranking:

Search for "Rust vs Go performance benchmarks" with combine=true to get results from all backends

Combine mode benefits:

  • Broader coverage across multiple search indexes
  • Results ranked by RRF — position-based scoring across all backends
  • Each result shows which backend found it
  • URL deduplication with content-aware merge (prefers richest result)
  • Useful for comprehensive research or when you want diverse sources

Tradeoff: Uses more API quota per query (all backends are called), but you get more comprehensive results.

Read Web Pages

Fetch any URL as clean markdown — great for extracting article content, docs, or reference pages:

Read https://docs.example.com/api-reference

The web_read tool supports:

  • objective — specific question to focus extraction
  • keywords — relevant terms to highlight on long pages
  • moderush for speed (return innerText) or smart (markdown extraction)
  • fresh — bypass cache when freshness matters

Supported Backends

# Backend Free Tier API Key? How to get key
1 DuckDuckGo Unlimited (rate-limited) No pip install ddgs (Linux/macOS: pip3)
2 Jina AI Free tier (API key req.) Yes jina.ai
3 Marginalia Search Unlimited (rate-limited) No marginalia.nu
4 Tavily 1,000 calls/month Yes tavily.com
5 Serper (Google) 2,500 queries/month Yes serper.dev
6 Brave 2,000 queries/month Yes brave.com/search/api
7 Firecrawl 500 free credits Yes firecrawl.dev
8 Exa 10 QPS rate-limited Yes exa.ai
9 LangSearch Genuinely free, no CC Yes langsearch.com
10 WebSearchAPI.ai 2,000 free credits Yes websearchapi.ai
11 Perplexity Sonar Unlimited free queries Yes perplexity.ai
12 SearXNG Self-hosted, unlimited No docs.searxng.org

† Marginalia Search uses public as a shared API key — no registration required, but subject to a shared rate limit.

Jina AI (s.jina.ai) returns full markdown content. Free tier requires a free API key from jina.ai.

SearXNG is a self-hosted metasearch engine. Run your own instance (or use a public one), no API key required. Configure the instance URL in .pi/search.json.

Removed: Stract, UnSearch, BoardReader, EntireWeb, Search1API, FreeAPITools.dev — no longer viable (public API removed, requires payment, or endpoint not implemented).

Configuration

Configure backends globally (all projects) or per-project:

Global: ~/.pi/agent/extensions/search.json Project: .pi/search.json (project takes precedence)

{
  "defaultBackend": "auto",
  "backends": {
    "duckduckgo": { "enabled": true },
    "jina":       { "enabled": true, "apiKey": "JINA_API_KEY" },
    "marginalia": { "enabled": true },
    "serper":     { "enabled": true, "apiKey": "SERPER_API_KEY" },
    "tavily":     { "enabled": true, "apiKey": "TAVILY_API_KEY" },
    "brave":      { "enabled": true, "apiKey": "BRAVE_API_KEY" },
    "exa":        { "enabled": true, "apiKey": "EXA_API_KEY" },
    "firecrawl":  { "enabled": true, "apiKey": "FIRECRAWL_API_KEY" },
    "langsearch": { "enabled": true, "apiKey": "LANGSEARCH_API_KEY" },
    "websearchapi":{ "enabled": true, "apiKey": "WEBSEARCHAPI_API_KEY" },
    "perplexity": { "enabled": true, "apiKey": "PERPLEXITY_API_KEY" },
    "searxng":    { "enabled": true, "instanceUrl": "http://localhost:8888" }
  }
}

Credential Resolution

The apiKey field supports four formats (following pi-web-providers convention):

apiKey value Resolved from Example
"SERPER_API_KEY" process.env.SERPER_API_KEY ALL_CAPS → env var
"!pass show api/serper" stdout of shell command (cached) ! prefix → exec
"sk-abc123..." Used as-is Literal key (backwards compatible)
(unset) SEARCH_<BACKEND>_API_KEY env fallback Auto-enables backend

Env var references: Any ALL_CAPS string is treated as an environment variable name (not a literal). If the referenced env var is unset, a warning is printed (your literal key is not silently discarded).

Shell commands: Commands prefixed with ! are executed via execSync with a 5s timeout. Results are cached and invalidated when config is reloaded (editing the config file clears the cache).

Convenience env vars: Backends are auto-enabled when these env vars are set (even with no config entry):

export SEARCH_SERPER_API_KEY="sk-..."
export SEARCH_TAVILY_API_KEY="sk-..."
export SEARCH_EXA_API_KEY="sk-..."
# ...
{
  "backends": {
    "serper": { "enabled": true, "apiKey": "SERPER_API_KEY" }
  }
}

To rotate a shell-command key: Update the secret in your password manager, then trigger a config reload (edit the config file, or wait 10s for automatic refresh).

Or use the interactive setup:

/search-setup

Commands

Command Description
/search-setup Interactive prompt to configure API keys for any backend
/search-status Show which backends are active and which have keys

How auto mode works

Fallback Mode (default, combine=false)

  1. Tries each enabled backend in order from your config
  2. If a backend fails (rate limit, auth error, etc.), moves to the next one
  3. DuckDuckGo requires no API key; Jina AI needs a free API key. Both serve as safety nets
  4. Returns results from the first backend that succeeds
  5. If all backends fail, reports the collected errors

Combine Mode (combine=true)

  1. Queries ALL enabled backends in parallel
  2. Each backend receives numResults / numBackends as a target
  3. Results are merged using Reciprocal Rank Fusion (RRF) — position-based scoring that works across incompatible ranking systems
  4. Each result shows its source backend (e.g., *Source: Tavily*)
  5. URL dedup prefers the result with the richest content (content > snippet)
  6. Backend statistics are displayed (which succeeded, result counts, errors)

RRF Scoring

RRF assigns each result a score of Σ(1 / (60 + rank_i)) across all backends that returned it. Results are ranked by score, then by number of backends that found them. This means a result ranked #1 by one backend and #5 by another beats a result ranked #4 by two backends.

Security

  • API keys are stored in local config files only (~/.pi/agent/extensions/search.json or .pi/search.json), never sent to any third party besides the chosen backend
  • Env vars and shell commands are supported for credential resolution — the config file is trusted (you own it), but never commit plain API keys to version control
  • DuckDuckGo queries use spawned Python subprocess (abortable via signal)
  • All HTTP backends have a 30-second timeout; shell commands for credentials have a 5-second timeout
  • Error messages are sanitized — API response bodies are truncated and key-like patterns are redacted
  • The .pi/ directory is in .gitignorenever commit API keys to version control

Testing

# Run the full benchmark against all backends
node benchmark/benchmark.mjs

# Quick test Jina AI (with your free API key)
curl -s -H "Authorization: Bearer $JINA_API_KEY" "https://s.jina.ai/?q=test&format=json" | jq .

# Quick test via curl with your configured key
curl -X POST "https://api.exa.ai/search" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $KEY" \
  -d '{"query": "test", "numResults": 3, "contents": {"text": true}}'

# Quick test Perplexity Sonar
curl -X POST "https://api.perplexity.ai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KEY" \
  -d '{"model": "sonar", "messages": [{"role": "user", "content": "test"}], "search_context_size": "low"}'

# Quick test SearXNG (replace URL with your instance)
curl "http://localhost:8888/search?q=test&format=json&count=3"

Adding a new backend

Backends are registered via the BACKEND_DEFS registry in extensions/search-hub.ts. Define a search function and add one entry to the registry:

const BACKEND_DEFS: Record<string, BackendRunner> = {
  // ... existing entries
  mybackend: {
    needsKey: true,
    needsKeyFromConfig: false,
    needsInstanceUrl: false,
    label: "My Backend",
    setupLabel: "My Backend (free tier description)",
    search: async (query, numResults, { key, signal }) => {
      const result = await searchMyBackend(query, numResults, key!, signal);
      return { results: result.results };
    },
  },
};

The registry handles dispatching, key resolution, formatting labels, and setup menu — no other edits needed.

License

MIT