pi-knowledge-search

Semantic search over local files for pi. Indexes a directory of text files, watches for changes, and exposes a knowledge_search tool to the LLM.

Package details

extension

Install pi-knowledge-search from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-knowledge-search
Package
pi-knowledge-search
Version
1.3.0
Published
May 3, 2026
Downloads
773/mo · 324/wk
Author
samfp
License
MIT
Types
extension
Size
215.2 KB
Dependencies
5 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./src/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-knowledge-search

Hybrid search over local files for pi. Indexes directories of text/markdown files using vector embeddings and SQLite FTS5 keyword search, watches for changes in real-time, and exposes a knowledge_search tool the LLM can call.

How search works

Every query runs against two backends in parallel and fuses the results via Reciprocal Rank Fusion (k=60):

  • Vector cosine similarity — good for conceptual/fuzzy queries ("how did we handle X")
  • BM25 full-text via SQLite FTS5 — good for exact matches, proper nouns, error strings, file paths, code identifiers

Docs that both backends agree on get boosted; either backend alone still surfaces relevant hits. If the embedder fails transiently, search falls back to pure BM25; if the FTS side-car is empty, it falls back to pure vector. Existing users upgrade seamlessly — the FTS side-car is backfilled from the vector index on first load with no re-embedding needed.

Install

Recommended: Install pi-total-recall to get the complete context stack — persistent memory, session history search, and local knowledge search in one package:

pi install pi-total-recall

Or install pi-knowledge-search standalone:

pi install git:github.com/samfoy/pi-knowledge-search

Or try without installing:

pi -e git:github.com/samfoy/pi-knowledge-search

Setup

Run the interactive setup command inside pi:

/knowledge-search-setup

This walks you through:

  1. Directories to index (comma-separated paths)
  2. File extensions to include (default: .md, .txt)
  3. Directories to exclude (default: node_modules, .git, .obsidian, .trash)
  4. Embedding provider — OpenAI, OpenAI-compatible (local/self-hosted), AWS Bedrock, or Ollama

Config is saved to ~/.pi/knowledge-search.json. Run /reload to activate.

Config file

You can also edit the config file directly:

{
  "dirs": ["~/notes", "~/docs"],
  "fileExtensions": [".md", ".txt"],
  "excludeDirs": ["node_modules", ".git", ".obsidian", ".trash"],
  "provider": {
    "type": "openai",
    "model": "text-embedding-3-small"
  }
}

The API key for OpenAI can be set in the config file ("apiKey": "sk-...") or via the OPENAI_API_KEY environment variable.

{
  "dirs": ["~/vault"],
  "provider": {
    "type": "bedrock",
    "profile": "my-aws-profile",
    "region": "us-west-2",
    "model": "amazon.titan-embed-text-v2:0"
  }
}

Requires the AWS SDK and valid credentials for the specified profile.

{
  "dirs": ["~/notes"],
  "provider": {
    "type": "ollama",
    "url": "http://localhost:11434",
    "model": "nomic-embed-text"
  }
}

Requires Ollama running locally:

ollama serve
ollama pull nomic-embed-text

Any server that exposes an OpenAI-compatible /v1/embeddings endpoint works: llama.cpp, vLLM, litellm, Ollama's OpenAI-compatibility mode, etc.

{
  "dirs": ["~/notes"],
  "provider": {
    "type": "openai-compatible",
    "baseUrl": "http://127.0.0.1:8080",
    "apiKey": "your-local-key",
    "model": "qwen3-embeddings"
  }
}

The baseUrl should be your server root without a trailing /v1 path — the embedder appends /v1/embeddings automatically.

For example with llama-cpp-python:

python -m llama_cpp.server --model ./models/qwen3-embedding.gguf --port 8080

Then configure knowledge-search to point at http://127.0.0.1:8080 as shown above.

The apiKey field is optional; omit it if your runner doesn't require authentication.

Bedrock Knowledge Bases

You can add Amazon Bedrock Knowledge Bases as additional search sources. These are managed RAG services — Amazon handles chunking, embedding, and vector storage. pi-knowledge-search queries them at search time and merges results with local file results.

Add via command:

/knowledge-add-kb

Or add directly to the config file:

{
  "dirs": ["~/notes"],
  "provider": { "type": "openai" },
  "knowledgeBases": [
    {
      "id": "XXXXXXXXXX",
      "region": "us-east-1",
      "profile": "default",
      "label": "Team docs"
    }
  ]
}

You can use Knowledge Bases alongside local file indexing, or on their own (omit dirs and provider for KB-only mode).

KB-only config:

{
  "knowledgeBases": [
    {
      "id": "XXXXXXXXXX",
      "region": "us-east-1",
      "profile": "my-work-profile",
      "label": "Engineering wiki"
    }
  ]
}

Requires the AWS SDK and valid credentials with bedrock:Retrieve permissions.

Environment variable overrides

Every config field can be overridden via environment variables. This is useful for CI or when you want different settings per shell session. See env-vars.md for the full list.

How it works

  1. On session start, loads the index from disk and incrementally syncs — only re-embeds new or modified files
  2. Starts a file watcher for real-time updates (debounced, 2s)
  3. Registers a knowledge_search tool the LLM calls with natural language queries
  4. Returns ranked results with file paths, relevance scores, and content excerpts

The index is stored at ~/.pi/knowledge-search/index.json.

Commands

Command Description
/knowledge-search-setup Interactive setup wizard
/knowledge-add-kb Add a Bedrock Knowledge Base as a search source
/knowledge-reindex Force a full re-index

Performance

Typical numbers for ~500 markdown files (~20MB):

Operation Time
Full index build ~7s
Incremental sync (no changes) ~12ms
File re-embed (watcher) ~200ms
Search query ~250ms
Index file size ~5MB

Project-local storage

By default, config lives at ~/.pi/knowledge-search.json and the index at ~/.pi/knowledge-search/. To relocate per-project, add one of the following to {project}/.pi/settings.json:

{
  "pi-knowledge-search": {
    "localPath": ".pi/knowledge-search"   // config.json + index/ under this path
  }
}

Or via the pi-total-recall cascade:

{
  "pi-total-recall": {
    "localPath": ".pi/total-recall"
    // pi-knowledge-search → {project}/.pi/total-recall/knowledge-search/
  }
}

Resolution order (highest priority first):

  1. KNOWLEDGE_SEARCH_CONFIG / KNOWLEDGE_SEARCH_INDEX_DIR env vars
  2. pi-knowledge-search.localPath in {cwd}/.pi/settings.json
  3. pi-total-recall.localPath cascade → {localPath}/knowledge-search/
  4. Global default: ~/.pi/knowledge-search.json + ~/.pi/knowledge-search/

Per-project indexes are particularly useful for vault- or doc-tree-scoped embeddings where you don't want cross-project bleed.

License

MIT