@arpagon/pi-web-providers

Configurable web access extension for pi with per-tool provider routing for search, contents, answers, and research across Claude, Cloudflare, Codex, Custom CLI, Exa, Gemini, Perplexity, Parallel, and Valyu.

Package details

extension

Install @arpagon/pi-web-providers from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@arpagon/pi-web-providers
Package
@arpagon/pi-web-providers
Version
1.2.0
Published
Mar 19, 2026
Downloads
38/mo Β· 11/wk
Author
arpagon
License
MIT
Types
extension
Size
299.8 KB
Dependencies
8 dependencies Β· 0 peers
Pi manifest JSON
{
  "extensions": [
    "./dist/index.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

🌍 pi-web-providers

A meta web extension for pi that routes search, content extraction, answers, and research through configurable per-tool providers.

Why?

Most web extensions hard-wire a single backend. pi-web-providers lets you mix and match providers per tool instead, so web_search, web_contents, web_answer, and web_research can each use a different backend or be turned off entirely.

✨ Features

  • Multiple providers β€” Claude, Cloudflare, Codex, Custom CLI, Exa, Gemini, Perplexity, Parallel, Valyu
  • Batched search and answers β€” run several related queries in a single web_search or web_answer call and get grouped results back in one response
  • Async contents prefetch β€” optionally start background web_contents extraction from web_search results and reuse the cached pages later

πŸ“¦ Install

pi install npm:pi-web-providers

βš™οΈ Configure

Run:

/web-providers

This edits the global config file ~/.pi/agent/web-providers.json. The settings UI mirrors the three sections below: tools, providers, and generic settings.

Each tool can be routed to any compatible provider:

Provider search contents answer research Auth
Claude βœ” βœ” Local Claude Code auth
Cloudflare βœ” CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID
Codex βœ” Local Codex CLI auth
Exa βœ” βœ” βœ” βœ” EXA_API_KEY
Gemini βœ” βœ” βœ” GOOGLE_API_KEY
Perplexity βœ” βœ” βœ” PERPLEXITY_API_KEY
Parallel βœ” βœ” PARALLEL_API_KEY
Valyu βœ” βœ” βœ” βœ” VALYU_API_KEY

Advanced option: custom-cli is a configurable adapter provider that can route any managed tool through a local wrapper command using a JSON stdin/stdout contract.

See example-config.json for a full default configuration.

Tools

Each managed tool maps to one provider id or null for off under the top-level tools key. A tool is only exposed when it is mapped to a compatible provider and that provider is currently available. Tool-specific settings live under toolSettings; today this covers toolSettings.search.prefetch.

web_search

Search the public web for up to 10 queries in one call. It returns grouped titles, URLs, and snippets for each query.

Parameter Type Default Description
queries string[] required One or more search queries to run (max 10)
maxResults integer 5 Result count per query, clamped to 1–20
options object β€” Provider-specific search options and local prefetch settings

web_search.options.prefetch is local-only and not forwarded into the provider SDK. It accepts provider, maxUrls, ttlMs, and contentsOptions, and starts a background page-extraction workflow only when prefetch.provider is set. /web-providers can also persist default search prefetch settings under toolSettings.search.prefetch.

web_contents

Read the main text from one or more web pages. It reuses cached pages when they match and fetches only missing or stale URLs.

Parameter Type Default Description
urls string[] required One or more URLs to extract
options object β€” Provider-specific extraction options

web_contents reuses any matching cached pages already present in the local content storeβ€”whether they came from prefetch or an earlier readβ€”and only fetches missing or stale URLs.

web_answer

Answer one or more questions using web-grounded evidence. When you ask more than one question, the response is grouped into per-question sections.

Parameter Type Default Description
queries string[] required One or more questions to answer in one call (max 10)
options object β€” Provider-specific options

Responses are grouped into per-question sections when more than one question is provided.

web_research

Investigate a topic across web sources and produce a longer report. The provider-specific options stay native to each SDK, and runtime options override provider configuration when both are set.

Parameter Type Default Description
input string required Research brief or question
options object β€” Provider-specific options

options are provider-native and provider-specific. Equivalent concepts can use different field names across SDKsβ€”for example Perplexity uses country, Exa uses userLocation, and Valyu uses countryCode. Runtime options override provider-native config, but managed tool inputs and tool wiring stay fixed.

The extension accepts local control fields for robustness: requestTimeoutMs, retryCount, and retryDelayMs on request/response tools, plus pollIntervalMs, timeoutMs, maxConsecutivePollErrors, and resumeId on web_research for lifecycle-based research providers. These fields are handled by the extension and are not forwarded into the provider SDK call.

  • Exa and Valyu research support polling, overall deadlines, and resume IDs but reject requestTimeoutMs and do not retry non-idempotent job creation.
  • Perplexity research runs in streaming foreground mode and only supports requestTimeoutMs, retryCount, and retryDelayMs.

Providers deliver results in one of three modes:

  • Silent foreground β€” no intermediate output; result returned when done.
  • Streaming foreground β€” progress updates while running, but the result is still only usable after the tool finishes.
  • Background research β€” the provider runs in the background; if interrupted, the run can be resumed later via resumeId.

Providers

The built-in providers below are thin adapters around official SDKs.

  • SDK: @anthropic-ai/claude-agent-sdk
  • Uses Claude Code's built-in WebSearch and WebFetch tools behind a structured JSON adapter
  • Runs in silent foreground mode
  • Supports request-shaping options such as model, thinking, effort, and maxTurns
  • Great for search plus grounded answers if you already use Claude Code locally
  • API: Cloudflare Browser Rendering REST API
  • Uses the /markdown endpoint to render pages in headless Chrome on Cloudflare's edge network and return clean Markdown
  • Runs in silent foreground mode
  • No external SDK dependency β€” uses fetch against the REST API
  • Handles JavaScript-rendered pages, dynamic content, and complex layouts
  • Reports per-URL success/failure with structured content entries
  • Requires a Cloudflare API token with Browser Rendering permissions and an account ID
  • Free tier: 10 min/day browser time, 6 req/min; Workers Paid ($5/mo): 10 hrs/month, 600 req/min
  • SDK: @openai/codex-sdk
  • Runs in read-only mode with web search enabled
  • Runs in silent foreground mode
  • Supports request-shaping web_search.options such as model, modelReasoningEffort, and webSearchMode
  • Best if you already use the local Codex CLI and auth flow
  • SDK: exa-js
  • Search, contents, and answer run in silent foreground mode
  • Research runs in background research mode and supports resumeId
  • Neural, keyword, hybrid, and deep-research search modes
  • Inline text-content extraction on search results
  • SDK: @google/genai
  • Search and answer run in silent foreground mode
  • Research runs in background research mode and supports resumeId
  • Google Search grounding for answers
  • Deep-research agents via Google's Gemini API
  • Supports provider-native request options such as model, config, generation_config, and agent_config depending on the tool
  • SDK: @perplexity-ai/perplexity_ai
  • web_search and web_answer run in silent foreground mode
  • web_research runs in streaming foreground mode (no resumeId support)
  • Uses Perplexity Search for web_search
  • Uses Sonar for web_answer and sonar-deep-research for web_research
  • Supports provider-specific web_search.options such as country, search_mode, search_domain_filter, and search_recency_filter
  • SDK: parallel-web
  • Runs in silent foreground mode
  • Agentic and one-shot search modes
  • Page content extraction with excerpt and full-content toggles
  • Supports provider-native search and extraction options from the Parallel SDK
  • SDK: valyu-js
  • Search, contents, and answer run in silent foreground mode
  • Research runs in background research mode and supports resumeId
  • Web, proprietary, and news search types
  • Supports provider-native options such as countryCode, responseLength, and search/source filters
  • Configurable response length for answers and research

Custom CLI provider

The custom-cli provider lets you bring your own wrapper command for any managed tool. Each capability can point at a different local command under providers["custom-cli"].native.

The repo includes actual wrapper examples under examples/custom-cli/wrappers/. They are small bash scripts that use jq for JSON handling. Each one uses a different backend pattern:

  • codex --search exec for web_search
  • Gemini API via curl for web_contents
  • claude -p for web_answer
  • Perplexity API via curl for web_research

Copy the example wrappers into a local ./wrappers/ directory, then configure:

{
  "tools": {
    "search": "custom-cli",
    "contents": "custom-cli",
    "answer": "custom-cli",
    "research": "custom-cli"
  },
  "providers": {
    "custom-cli": {
      "enabled": true,
      "native": {
        "search": {
          "argv": ["bash", "./wrappers/codex-search.sh"]
        },
        "contents": {
          "argv": ["bash", "./wrappers/gemini-contents.sh"]
        },
        "answer": {
          "argv": ["bash", "./wrappers/claude-answer.sh"]
        },
        "research": {
          "argv": ["bash", "./wrappers/perplexity-research.sh"]
        }
      }
    }
  }
}

Those example wrappers deliberately use different local CLIs and APIs so you can see several wrapper styles in one setup without extra glue code.

Each capability can also set an optional cwd and env block. Use cwd when one wrapper must run from a specific directory. Use env for per-command variables; each value can be a literal string, an environment variable name, or !command.

web_research runs as a foreground wrapper command, so local polling controls (pollIntervalMs, timeoutMs, maxConsecutivePollErrors) and resumeId do not apply to custom-cli.

Wrapper contract:

  • stdin: one JSON request object with capability plus the per-call managed inputs (query, urls, input, maxResults, options, cwd)
  • stdout: one JSON response object
    • search: { "results": [{ "title", "url", "snippet" }] }
    • contents / answer / research: { "text": "...", "summary"?: "...", "itemCount"?: 1, "metadata"?: {} }
  • stderr: optional progress lines
  • exit code 0: success
  • non-zero exit code: failure

See examples/custom-cli/README.md for a copy-and-pasteable setup, and see examples/custom-cli/wrappers/ for the actual wrapper files.

Generic settings

The genericSettings block sets shared execution defaults that apply to all providers unless overridden in a provider's policy block:

Field Default Description
requestTimeoutMs 30000 Maximum time for a single provider request
retryCount 3 Retries for transient failures
retryDelayMs 2000 Initial delay before retrying
researchPollIntervalMs 3000 How often to poll long-running research jobs
researchTimeoutMs 21600000 Overall deadline for research before returning
researchMaxConsecutivePollErrors 3 Consecutive poll failures before stopping

πŸ“„ License

MIT