@codingcoffee/pi-websearch-crawl4ai

a pi extension to let your LLM crawl & see the web

Package details

extension

Install @codingcoffee/pi-websearch-crawl4ai from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@codingcoffee/pi-websearch-crawl4ai
Package
@codingcoffee/pi-websearch-crawl4ai
Version
0.2.1
Published
Apr 26, 2026
Downloads
676/mo · 252/wk
Author
codingcoffee
License
MIT
Types
extension
Size
23.3 KB
Dependencies
0 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ],
  "image": "https://pi-websearch-crawl4ai.codingcoffee.dev/og.png"
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-websearch-crawl4ai

A pi extension that lets the agent fetch content from the web via a running Crawl4AI server.

Intended use: running pi with the bash tool disabled (so curl / wget are unavailable) while still letting the model read and crawl web pages.

What it gives the LLM

Six tools, all talking to a Crawl4AI server:

Tool Endpoint Purpose
web_fetch POST /md Fetch a URL → clean Markdown (filters: fit/raw/bm25/llm)
web_fetch_html POST /html Sanitized HTML for DOM-aware tasks
web_crawl POST /crawl Multi-URL crawl with typed BrowserConfig/CrawlerRunConfig
web_execute_js POST /execute_js Run JS snippets on a page and read back JSON
web_screenshot POST /screenshot Full-page PNG screenshot (returned inline)
web_ask GET /ask Query the Crawl4AI library's own docs (for configuring it)

Plus commands: /crawl4ai-status, /crawl4ai-url <url>, /crawl4ai-token <tok>.

Prerequisites

You need a Crawl4AI server reachable from where pi runs. The fastest path:

docker run -d \
  -p 11235:11235 \
  --name crawl4ai \
  --shm-size=1g \
  unclecode/crawl4ai:latest

# Sanity check
curl http://localhost:11235/health

See the Crawl4AI Docker guide for GPU, LLM keys, config.yml, JWT auth, etc.

Install as a pi extension

# project-local
mkdir -p .pi/extensions
ln -s "$(pwd)/pi-websearch-crawl4ai" .pi/extensions/crawl4ai

# or global
mkdir -p ~/.pi/agent/extensions
ln -s "$(pwd)/pi-websearch-crawl4ai" ~/.pi/agent/extensions/crawl4ai

pi auto-discovers index.ts via the "pi".extensions field in package.json.

Alternatively, for a one-off test:

pi -e ./pi-websearch-crawl4ai/index.ts

Configuration

Precedence: CLI flag > env var > default.

Setting Env Flag Default
Base URL CRAWL4AI_BASE_URL --crawl4ai-url <url> http://localhost:11235
Auth token CRAWL4AI_TOKEN --crawl4ai-token <tok> (none)

At runtime you can also:

  • /crawl4ai-status — show current config + /health
  • /crawl4ai-url http://host:11235 — change base URL for this session
  • /crawl4ai-token <jwt> — set bearer token (empty clears it)

Example use

Running pi with only read-only tools and this extension:

pi --tools read,write,edit,web_fetch,web_crawl
> "Read https://example.com and summarize it."

The model will call web_fetch instead of reaching for bash/curl.

How web_crawl typed configs work

Crawl4AI accepts configuration objects shaped as {"type":"ClassName","params":{...}}. Example you (or the model) can pass:

{
  "urls": ["https://example.com", "https://httpbin.org/html"],
  "browser_config": { "type": "BrowserConfig", "params": { "headless": true } },
  "crawler_config": {
    "type": "CrawlerRunConfig",
    "params": { "cache_mode": "bypass", "stream": false }
  }
}

If you need to remind the model what's available, it can call web_ask with a query like "CrawlerRunConfig parameters" to pull the Crawl4AI library docs.

Security note

Extensions run with your user's full permissions. The tools here can fetch arbitrary URLs via your Crawl4AI server. If that's a problem, run Crawl4AI with rate limiting / allowlists configured in its config.yml, and/or restrict which tools pi activates via --tools.