@bacnh85/pi-web

Pi extension for web search, page extraction, and Firecrawl scraping/crawling.

Packages

Package details

extension

Install @bacnh85/pi-web from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@bacnh85/pi-web
Package
@bacnh85/pi-web
Version
0.1.2
Published
Jun 26, 2026
Downloads
not available
Author
bacnh85
License
MIT
Types
extension
Size
35.4 KB
Dependencies
4 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-web

Pi extension for web search, readable page extraction, and Firecrawl scraping/crawling.

Install

Install the published package from npm:

pi install npm:@bacnh85/pi-web

From this repository checkout, install only this extension package:

cd extensions/pi-web
npm install
cd ../..

pi install ./extensions/pi-web
# or test directly
pi -e ./extensions/pi-web

The package manifest points Pi directly at ./index.ts, so published npm installs and local installs load the same extension entrypoint.

If you manually copy this directory instead of using pi install, run npm install --omit=dev in the copied pi-web directory so readable-content dependencies such as @mozilla/readability are present.

Configuration

Environment lookup order:

  1. Process environment
  2. Current working directory .env.local
  3. Current working directory .env
  4. Pi global config .env.local ($PI_CODING_AGENT_DIR/.env.local when set; otherwise ~/.pi/agent/.env.local then ~/.pi/agents/.env.local)
  5. Pi global config .env ($PI_CODING_AGENT_DIR/.env when set; otherwise ~/.pi/agent/.env then ~/.pi/agents/.env)

Variables:

  • BRAVE_API_KEY — required for Brave Search.
  • SEARXNG_BASE_URL — optional; defaults to http://172.30.55.22:8888.
  • FIRECRAWL_API_URL — optional; defaults to https://api.firecrawl.dev/v2.
  • FIRECRAWL_API_KEY — required for hosted Firecrawl, optional for self-hosted instances without auth.
  • FIRECRAWL_TIMEOUT_MS — optional request timeout, default 60000.

Secrets are never printed; status reports only show presence/source.

Tools

Brave:

  • brave_search — search web results, optionally fetch readable result content.
  • brave_content — fetch a URL and extract readable markdown.

SearXNG:

  • searxng_search — search web results through a configured self-hosted SearXNG metasearch instance.

Firecrawl:

  • firecrawl_search — search web/news/images, optionally scrape result markdown.
  • firecrawl_scrape — scrape a URL as markdown/html/links/summary/json.
  • firecrawl_map — discover URLs for a site.
  • firecrawl_crawl — start a conservative crawl, optionally polling for results.

Utility:

  • web_status — show configured provider status without secrets.
  • /web-status — command version of the status check.

Practical Guidance

  • Use searxng_search first for general web search, docs lookup, current facts, and source discovery; it is fast, self-hosted, and avoids hosted API costs/rate limits.
  • Use brave_search as a fast hosted fallback when SearXNG results are weak/unavailable, or when an independent search index is useful. Use include_content sparingly.
  • Use brave_content for fast known-URL article/docs extraction when simple readable markdown is enough.
  • Use firecrawl_search when SearXNG/Brave are unavailable, when Firecrawl search is preferred, or when search results should include scraped markdown.
  • Use firecrawl_scrape for known URLs when extraction quality matters, pages are dynamic, links are needed, or structured JSON extraction is requested.
  • Use firecrawl_map before crawling to discover candidate URLs and keep crawls targeted.
  • Use firecrawl_crawl only when multiple pages are required; keep limits conservative and use include/exclude paths.
  • When answering from web content, cite source URLs from result links or metadata.
  • Use web_status when provider configuration is uncertain or a web tool fails due to credentials/config.