pi-web-access-lean

Lean web search, URL fetching, code search, GitHub repo cloning, and PDF extraction for Pi coding agent

Packages

Package details

extension

Install pi-web-access-lean from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-web-access-lean
Package
pi-web-access-lean
Version
0.1.1
Published
May 29, 2026
Downloads
not available
Author
nabsku_
License
MIT
Types
extension
Size
141.5 KB
Dependencies
5 dependencies · 0 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

Pi Web Access Lean

Web search, code search, URL fetching, GitHub repository cloning, and PDF extraction for Pi Coding Agent.

Fork of nicobailon/pi-web-access. Full credit for the original extension, feature design, and implementation goes to the upstream project and its author.

This fork keeps the web-access tools that are commonly useful in coding-agent sessions and keeps the tool surface small.

Upstream credit

This repository is a lean fork of the original pi-web-access extension by Nico Bailon.

The original extension includes the broader feature set: web search, content extraction, curator workflow, Gemini/Web fallback paths, YouTube/video understanding, and related tooling. This fork intentionally removes parts of that surface for a smaller Pi Coding Agent footprint; it is not a replacement for the full upstream package.

Features

  • web_search: web search through Exa or Perplexity
  • code_search: code, documentation, and API search through Exa MCP code context, with fallback search
  • fetch_content: readable markdown extraction for URLs
  • GitHub repository handling: clone repositories locally instead of scraping rendered HTML
  • PDF extraction: extract text PDFs and save markdown output
  • HTML extraction: Readability, Next.js RSC parsing, and Jina Reader fallback
  • Activity widget for request/response visibility

Design

Pi loads extension tool schemas into the agent context. Large schemas and rarely used workflows increase every prompt, including prompts that never use web access.

This package is built around a smaller default surface:

  • three tools: search, code search, fetch content
  • no interactive search-review workflow
  • no browser-cookie handling
  • no video processing pipeline
  • no provider paths that require unrelated credentials
  • concise tool descriptions

The result is a smaller extension footprint while preserving the main web and code-research paths used by a coding agent.

Benchmarks

Measurement command shape:

PI_CODING_AGENT_DIR="$TMP" \
pi -p --no-session \
  --no-skills --no-context-files --no-prompt-templates --no-themes \
  --mode json \
  --model openai-codex/gpt-5.5 \
  --thinking minimal \
  "Reply exactly OK"

Measured extension input-token footprint:

  • Original pi-web-access: +1,180 input tokens
  • pi-web-access-lean: +302 input tokens
  • Reduction: 878 input tokens

Install

From npm:

pi install npm:pi-web-access-lean

Or add it to Pi settings:

{
  "packages": ["npm:pi-web-access-lean"]
}

From GitHub:

pi install git:github.com/Nabsku/pi-web-access-lean

From a local checkout, useful while developing:

git clone https://github.com/Nabsku/pi-web-access-lean.git
pi install /path/to/pi-web-access-lean

Requires Pi Coding Agent with extension support.

Configuration

Configuration is read from ~/.pi/web-search.json. All fields are optional.

{
  "exaApiKey": "exa-...",
  "perplexityApiKey": "pplx-...",
  "provider": "auto",
  "githubClone": {
    "enabled": true,
    "maxRepoSizeMB": 350,
    "cloneTimeoutSeconds": 30,
    "clonePath": "/tmp/pi-github-repos"
  },
  "shortcuts": {
    "activity": "ctrl+shift+w"
  }
}

Provider selection:

  • auto: Exa first, then Perplexity when configured
  • exa: Exa only
  • perplexity: Perplexity only

Environment variables take precedence where supported:

  • EXA_API_KEY
  • PERPLEXITY_API_KEY

Tools

web_search

Search the web and return an answer with sources.

web_search({ query: "TypeScript best practices 2026" })
web_search({ queries: ["query 1", "query 2"] })
web_search({ query: "AI agent observability", recencyFilter: "week" })
web_search({ query: "React Server Components", domainFilter: ["react.dev"] })
web_search({ query: "Pi Coding Agent extensions", provider: "exa" })
web_search({ query: "benchmark result", includeContent: true })

Parameters:

  • query / queries: single query or batch of queries
  • numResults: results per query, default 5, max 20
  • recencyFilter: day, week, month, or year
  • domainFilter: domains to include; prefix with - to exclude
  • provider: auto, exa, or perplexity
  • includeContent: fetch page content for results in the background

code_search

Search for code examples, documentation, APIs, and debugging references.

Uses Exa MCP code-context when available. Falls back to code-focused web search.

code_search({ query: "React useEffect cleanup pattern" })
code_search({ query: "Express middleware error handling", maxTokens: 10000 })

Parameters:

  • query: programming question, API, library, or debugging topic
  • maxTokens: context budget, default 5000, max 50000

fetch_content

Fetch URLs and extract readable content as markdown.

fetch_content({ url: "https://example.com/article" })
fetch_content({ urls: ["https://a.example", "https://b.example"] })
fetch_content({ url: "https://github.com/owner/repo" })
fetch_content({ url: "https://example.com/report.pdf" })

Parameters:

  • url / urls: single URL/path or multiple URLs
  • forceClone: clone GitHub repositories that exceed the size threshold

Extraction flow

web_search(query)
  → Exa direct API or MCP
  → Perplexity, when configured

fetch_content(url)
  → GitHub URL? clone repository or use GitHub API fallback
  → HTTP fetch
      → PDF? extract text, save markdown to ~/Downloads/
      → HTML? Readability → RSC parser → Jina Reader fallback
      → text/json/markdown? return directly

Commands

Activity monitor

Toggle with Ctrl+Shift+W to see live request/response activity:

─── Web Search Activity ────────────────────────────────────
  API  "typescript best practices"     200    2.1s ✓
  GET  docs.example.com/article        200    0.8s ✓
  GET  blog.example.com/post           404    0.3s ✗
────────────────────────────────────────────────────────────

Development

npm install
npm test

Files

  • index.ts: extension entry, tools, activity widget
  • search.ts: search routing for Exa and Perplexity
  • code-search.ts: code/docs search via Exa MCP
  • extract.ts: URL/path routing, HTTP extraction, fallback orchestration
  • github-extract.ts: GitHub URL parsing, clone cache, content generation
  • github-api.ts: GitHub API fallback for large repositories and commit SHAs
  • exa.ts: Exa search provider, direct API and MCP proxy
  • perplexity.ts: Perplexity API client with rate limiting
  • pdf-extract.ts: PDF text extraction, saves markdown output
  • rsc-extract.ts: RSC flight data parser for Next.js pages
  • utils.ts: shared formatting and error helpers
  • activity.ts: activity tracking widget