pikit-web-access

Web search and content extraction for pi

Packages

Package details

extension

Install pikit-web-access from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pikit-web-access
Package
pikit-web-access
Version
1.0.1
Published
Jul 5, 2026
Downloads
not available
Author
adrianapan
License
MIT
Types
extension
Size
21.6 KB
Dependencies
5 dependencies · 1 peer
Pi manifest JSON
{
  "extensions": [
    "./src/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pikit-web-access — web search & content extraction for pi.dev

Web search and content extraction for pi. Search the web via Gemini AI, fetch and read web pages, and extract text from PDFs — all from within the agent.

Install

pi install npm:pikit-web-access

[!TIP] Or grab the entire pikit setup, an opinionated pi.dev configuration that includes this extension.

Features

  • Web search: Queries Gemini AI with Google Search grounding — returns a synthesized answer with source citations. Uses gemini-2.5-flash-lite (hardcoded in src/search.ts), the cheapest model in the 2.5 family that supports the google_search grounding tool.
  • Page fetching: Fetches any URL and extracts clean readable markdown via Readability + Turndown
  • PDF extraction: Detects PDFs by URL or content-type and extracts their text — no API key required
  • Multi-URL support: Fetch several URLs in parallel in a single call
  • Result storage: Large responses are stored and retrievable in full via get_search_content
  • Session persistence: Stored results survive /reload and are restored on session start

Structure

web-access/
├── package.json
├── README.md
└── src/
    ├── index.ts     — tool registration and session_start restore
    ├── types.ts     — shared interfaces (SearchResult, ExtractedContent, StoredData)
    ├── config.ts    — reads GEMINI_API_KEY from process.env, clear error if missing
    ├── storage.ts   — in-memory result store with session persistence via appendEntry
    ├── search.ts    — web_search via Gemini API with google_search grounding (model: gemini-2.5-flash-lite)
    ├── extract.ts   — fetch pipeline: Readability → Turndown, PDF detection and routing
    ├── pdf.ts       — PDF text extraction via unpdf
    └── utils.ts     — shared helpers (truncate, errorMessage, abort check, PDF detection)

Configuration

web_search requires a Gemini API key. fetch_content (including PDF) works without one.

Option 1 — reuse your existing pi Gemini key

If you already have GEMINI_API_KEY set in your environment for Gemini models, the extension picks it up automatically — no extra config needed.

Option 2 — add to your shell profile (e.g. ~/.zshrc)

export GEMINI_API_KEY="AIza...your-key-here"

Security note: the key will be in the shell environment, visible by the bash tool. Avoid running env or commands that print the full environment when a model is watching.

Get a free key at: https://aistudio.google.com/apikey

If GEMINI_API_KEY is not set and web_search is called, the tool returns a clear error message with setup instructions.

Tools

web_search

Search the web via Gemini AI with Google Search grounding. Returns a synthesized answer with source citations.

web_search({ query: "TypeScript 5.5 new features" })
web_search({ queries: ["React 19 changes", "Next.js 15 release notes"] })
Parameter Description
query Single search query
queries Multiple queries run in parallel

fetch_content

Fetch one or more URLs and extract readable content as markdown. Automatically handles PDFs.

fetch_content({ url: "https://example.com/article" })
fetch_content({ urls: ["https://url1.com", "https://url2.com"] })
fetch_content({ url: "https://example.com/doc.pdf" })
Parameter Description
url Single URL to fetch
urls Multiple URLs fetched in parallel (3 at a time)

get_search_content

Retrieve full stored content from a previous web_search or fetch_content call. Use when the original response was truncated.

get_search_content({ responseId: "abc123" })
get_search_content({ responseId: "abc123", queryIndex: 1 })
get_search_content({ responseId: "abc123", url: "https://example.com" })
get_search_content({ responseId: "abc123", urlIndex: 2 })
Parameter Description
responseId The responseId from a previous web_search or fetch_content call
queryIndex Query to retrieve by index (web_search, default: 0)
urlIndex URL to retrieve by index (fetch_content, default: 0)
url Retrieve content for a specific URL (fetch_content)

Stored results expire after 1 hour or when a new session starts.