@arpagon/pi-web-providers

Configurable web access extension for pi with per-tool provider routing for search, contents, answers, and research across Claude, Cloudflare, Codex, Custom CLI, Exa, Gemini, Perplexity, Parallel, and Valyu.

Package details

← Back

extension

Install @arpagon/pi-web-providers from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@arpagon/pi-web-providers

Package: @arpagon/pi-web-providers
Version: 1.2.0
Published: Mar 19, 2026
Downloads: 38/mo · 11/wk
Author: arpagon
License: MIT
Types: extension
Size: 299.8 KB
Dependencies: 8 dependencies · 0 peers

Pi manifest JSON

{
  "extensions": [
    "./dist/index.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

🌍 pi-web-providers

A meta web extension for pi that routes search, content extraction, answers, and research through configurable per-tool providers.

Why?

Most web extensions hard-wire a single backend. pi-web-providers lets you mix and match providers per tool instead, so web_search, web_contents, web_answer, and web_research can each use a different backend or be turned off entirely.

✨ Features

Multiple providers — Claude, Cloudflare, Codex, Custom CLI, Exa, Gemini, Perplexity, Parallel, Valyu
Batched search and answers — run several related queries in a single web_search or web_answer call and get grouped results back in one response
Async contents prefetch — optionally start background web_contents extraction from web_search results and reuse the cached pages later

📦 Install

pi install npm:pi-web-providers

⚙️ Configure

Run:

/web-providers

This edits the global config file ~/.pi/agent/web-providers.json. The settings UI mirrors the three sections below: tools, providers, and generic settings.

Each tool can be routed to any compatible provider:

Provider	search	contents	answer	research	Auth
Claude	✔		✔		Local Claude Code auth
Cloudflare		✔			`CLOUDFLARE_API_TOKEN` + `CLOUDFLARE_ACCOUNT_ID`
Codex	✔				Local Codex CLI auth
Exa	✔	✔	✔	✔	`EXA_API_KEY`
Gemini	✔		✔	✔	`GOOGLE_API_KEY`
Perplexity	✔		✔	✔	`PERPLEXITY_API_KEY`
Parallel	✔	✔			`PARALLEL_API_KEY`
Valyu	✔	✔	✔	✔	`VALYU_API_KEY`

Advanced option: custom-cli is a configurable adapter provider that can route any managed tool through a local wrapper command using a JSON stdin/stdout contract.

See example-config.json for a full default configuration.

Tools

Each managed tool maps to one provider id or null for off under the top-level tools key. A tool is only exposed when it is mapped to a compatible provider and that provider is currently available. Tool-specific settings live under toolSettings; today this covers toolSettings.search.prefetch.

`web_search`

Search the public web for up to 10 queries in one call. It returns grouped titles, URLs, and snippets for each query.

Parameter	Type	Default	Description
`queries`	string[]	required	One or more search queries to run (max 10)
`maxResults`	integer	`5`	Result count per query, clamped to `1–20`
`options`	object	—	Provider-specific search options and local `prefetch` settings

web_search.options.prefetch is local-only and not forwarded into the provider SDK. It accepts provider, maxUrls, ttlMs, and contentsOptions, and starts a background page-extraction workflow only when prefetch.provider is set. /web-providers can also persist default search prefetch settings under toolSettings.search.prefetch.

`web_contents`

Read the main text from one or more web pages. It reuses cached pages when they match and fetches only missing or stale URLs.

Parameter	Type	Default	Description
`urls`	string[]	required	One or more URLs to extract
`options`	object	—	Provider-specific extraction options

web_contents reuses any matching cached pages already present in the local content store—whether they came from prefetch or an earlier read—and only fetches missing or stale URLs.

`web_answer`

Answer one or more questions using web-grounded evidence. When you ask more than one question, the response is grouped into per-question sections.

Parameter	Type	Default	Description
`queries`	string[]	required	One or more questions to answer in one call (max 10)
`options`	object	—	Provider-specific options

Responses are grouped into per-question sections when more than one question is provided.

`web_research`

Investigate a topic across web sources and produce a longer report. The provider-specific options stay native to each SDK, and runtime options override provider configuration when both are set.

Parameter	Type	Default	Description
`input`	string	required	Research brief or question
`options`	object	—	Provider-specific options

options are provider-native and provider-specific. Equivalent concepts can use different field names across SDKs—for example Perplexity uses country, Exa uses userLocation, and Valyu uses countryCode. Runtime options override provider-native config, but managed tool inputs and tool wiring stay fixed.

The extension accepts local control fields for robustness: requestTimeoutMs, retryCount, and retryDelayMs on request/response tools, plus pollIntervalMs, timeoutMs, maxConsecutivePollErrors, and resumeId on web_research for lifecycle-based research providers. These fields are handled by the extension and are not forwarded into the provider SDK call.

Exa and Valyu research support polling, overall deadlines, and resume IDs but reject requestTimeoutMs and do not retry non-idempotent job creation.
Perplexity research runs in streaming foreground mode and only supports requestTimeoutMs, retryCount, and retryDelayMs.

Providers deliver results in one of three modes:

Silent foreground — no intermediate output; result returned when done.
Streaming foreground — progress updates while running, but the result is still only usable after the tool finishes.
Background research — the provider runs in the background; if interrupted, the run can be resumed later via resumeId.

Providers

The built-in providers below are thin adapters around official SDKs.

SDK: @anthropic-ai/claude-agent-sdk
Uses Claude Code's built-in WebSearch and WebFetch tools behind a structured JSON adapter
Runs in silent foreground mode
Supports request-shaping options such as model, thinking, effort, and maxTurns
Great for search plus grounded answers if you already use Claude Code locally

API: Cloudflare Browser Rendering REST API
Uses the /markdown endpoint to render pages in headless Chrome on Cloudflare's edge network and return clean Markdown
Runs in silent foreground mode
No external SDK dependency — uses fetch against the REST API
Handles JavaScript-rendered pages, dynamic content, and complex layouts
Reports per-URL success/failure with structured content entries
Requires a Cloudflare API token with Browser Rendering permissions and an account ID
Free tier: 10 min/day browser time, 6 req/min; Workers Paid ($5/mo): 10 hrs/month, 600 req/min

SDK: @openai/codex-sdk
Runs in read-only mode with web search enabled
Runs in silent foreground mode
Supports request-shaping web_search.options such as model, modelReasoningEffort, and webSearchMode
Best if you already use the local Codex CLI and auth flow

SDK: exa-js
Search, contents, and answer run in silent foreground mode
Research runs in background research mode and supports resumeId
Neural, keyword, hybrid, and deep-research search modes
Inline text-content extraction on search results

SDK: @google/genai
Search and answer run in silent foreground mode
Research runs in background research mode and supports resumeId
Google Search grounding for answers
Deep-research agents via Google's Gemini API
Supports provider-native request options such as model, config, generation_config, and agent_config depending on the tool

SDK: @perplexity-ai/perplexity_ai
web_search and web_answer run in silent foreground mode
web_research runs in streaming foreground mode (no resumeId support)
Uses Perplexity Search for web_search
Uses Sonar for web_answer and sonar-deep-research for web_research
Supports provider-specific web_search.options such as country, search_mode, search_domain_filter, and search_recency_filter

SDK: parallel-web
Runs in silent foreground mode
Agentic and one-shot search modes
Page content extraction with excerpt and full-content toggles
Supports provider-native search and extraction options from the Parallel SDK

SDK: valyu-js
Search, contents, and answer run in silent foreground mode
Research runs in background research mode and supports resumeId
Web, proprietary, and news search types
Supports provider-native options such as countryCode, responseLength, and search/source filters
Configurable response length for answers and research

Custom CLI provider

The custom-cli provider lets you bring your own wrapper command for any managed tool. Each capability can point at a different local command under providers["custom-cli"].native.

The repo includes actual wrapper examples under examples/custom-cli/wrappers/. They are small bash scripts that use jq for JSON handling. Each one uses a different backend pattern:

codex --search exec for web_search
Gemini API via curl for web_contents
claude -p for web_answer
Perplexity API via curl for web_research

Copy the example wrappers into a local ./wrappers/ directory, then configure:

{
  "tools": {
    "search": "custom-cli",
    "contents": "custom-cli",
    "answer": "custom-cli",
    "research": "custom-cli"
  },
  "providers": {
    "custom-cli": {
      "enabled": true,
      "native": {
        "search": {
          "argv": ["bash", "./wrappers/codex-search.sh"]
        },
        "contents": {
          "argv": ["bash", "./wrappers/gemini-contents.sh"]
        },
        "answer": {
          "argv": ["bash", "./wrappers/claude-answer.sh"]
        },
        "research": {
          "argv": ["bash", "./wrappers/perplexity-research.sh"]
        }
      }
    }
  }
}

Those example wrappers deliberately use different local CLIs and APIs so you can see several wrapper styles in one setup without extra glue code.

Each capability can also set an optional cwd and env block. Use cwd when one wrapper must run from a specific directory. Use env for per-command variables; each value can be a literal string, an environment variable name, or !command.

web_research runs as a foreground wrapper command, so local polling controls (pollIntervalMs, timeoutMs, maxConsecutivePollErrors) and resumeId do not apply to custom-cli.

Wrapper contract:

stdin: one JSON request object with capability plus the per-call managed inputs (query, urls, input, maxResults, options, cwd)
stdout: one JSON response object
- search: { "results": [{ "title", "url", "snippet" }] }
- contents / answer / research: { "text": "...", "summary"?: "...", "itemCount"?: 1, "metadata"?: {} }
stderr: optional progress lines
exit code 0: success
non-zero exit code: failure

See examples/custom-cli/README.md for a copy-and-pasteable setup, and see examples/custom-cli/wrappers/ for the actual wrapper files.

Generic settings

The genericSettings block sets shared execution defaults that apply to all providers unless overridden in a provider's policy block:

Field	Default	Description
`requestTimeoutMs`	`30000`	Maximum time for a single provider request
`retryCount`	`3`	Retries for transient failures
`retryDelayMs`	`2000`	Initial delay before retrying
`researchPollIntervalMs`	`3000`	How often to poll long-running research jobs
`researchTimeoutMs`	`21600000`	Overall deadline for research before returning
`researchMaxConsecutivePollErrors`	`3`	Consecutive poll failures before stopping

📄 License

MIT