pi-knowledge-search
Semantic search over local files for pi. Indexes a directory of text files, watches for changes, and exposes a knowledge_search tool to the LLM.
Package details
Install pi-knowledge-search from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-knowledge-search- Package
pi-knowledge-search- Version
1.3.0- Published
- May 3, 2026
- Downloads
- 773/mo · 324/wk
- Author
- samfp
- License
- MIT
- Types
- extension
- Size
- 215.2 KB
- Dependencies
- 5 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./src/index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-knowledge-search
Hybrid search over local files for pi. Indexes directories of text/markdown files using vector embeddings and SQLite FTS5 keyword search, watches for changes in real-time, and exposes a knowledge_search tool the LLM can call.
How search works
Every query runs against two backends in parallel and fuses the results via Reciprocal Rank Fusion (k=60):
- Vector cosine similarity — good for conceptual/fuzzy queries ("how did we handle X")
- BM25 full-text via SQLite FTS5 — good for exact matches, proper nouns, error strings, file paths, code identifiers
Docs that both backends agree on get boosted; either backend alone still surfaces relevant hits. If the embedder fails transiently, search falls back to pure BM25; if the FTS side-car is empty, it falls back to pure vector. Existing users upgrade seamlessly — the FTS side-car is backfilled from the vector index on first load with no re-embedding needed.
Install
Recommended: Install pi-total-recall to get the complete context stack — persistent memory, session history search, and local knowledge search in one package:
pi install pi-total-recall
Or install pi-knowledge-search standalone:
pi install git:github.com/samfoy/pi-knowledge-search
Or try without installing:
pi -e git:github.com/samfoy/pi-knowledge-search
Setup
Run the interactive setup command inside pi:
/knowledge-search-setup
This walks you through:
- Directories to index (comma-separated paths)
- File extensions to include (default:
.md, .txt) - Directories to exclude (default:
node_modules, .git, .obsidian, .trash) - Embedding provider — OpenAI, OpenAI-compatible (local/self-hosted), AWS Bedrock, or Ollama
Config is saved to ~/.pi/knowledge-search.json. Run /reload to activate.
Config file
You can also edit the config file directly:
{
"dirs": ["~/notes", "~/docs"],
"fileExtensions": [".md", ".txt"],
"excludeDirs": ["node_modules", ".git", ".obsidian", ".trash"],
"provider": {
"type": "openai",
"model": "text-embedding-3-small"
}
}
The API key for OpenAI can be set in the config file ("apiKey": "sk-...") or via the OPENAI_API_KEY environment variable.
{
"dirs": ["~/vault"],
"provider": {
"type": "bedrock",
"profile": "my-aws-profile",
"region": "us-west-2",
"model": "amazon.titan-embed-text-v2:0"
}
}
Requires the AWS SDK and valid credentials for the specified profile.
{
"dirs": ["~/notes"],
"provider": {
"type": "ollama",
"url": "http://localhost:11434",
"model": "nomic-embed-text"
}
}
Requires Ollama running locally:
ollama serve
ollama pull nomic-embed-text
Any server that exposes an OpenAI-compatible /v1/embeddings endpoint works:
llama.cpp, vLLM,
litellm, Ollama's OpenAI-compatibility mode, etc.
{
"dirs": ["~/notes"],
"provider": {
"type": "openai-compatible",
"baseUrl": "http://127.0.0.1:8080",
"apiKey": "your-local-key",
"model": "qwen3-embeddings"
}
}
The baseUrl should be your server root without a trailing /v1 path — the embedder appends /v1/embeddings automatically.
For example with llama-cpp-python:
python -m llama_cpp.server --model ./models/qwen3-embedding.gguf --port 8080
Then configure knowledge-search to point at http://127.0.0.1:8080 as shown above.
The apiKey field is optional; omit it if your runner doesn't require authentication.
Bedrock Knowledge Bases
You can add Amazon Bedrock Knowledge Bases as additional search sources. These are managed RAG services — Amazon handles chunking, embedding, and vector storage. pi-knowledge-search queries them at search time and merges results with local file results.
Add via command:
/knowledge-add-kb
Or add directly to the config file:
{
"dirs": ["~/notes"],
"provider": { "type": "openai" },
"knowledgeBases": [
{
"id": "XXXXXXXXXX",
"region": "us-east-1",
"profile": "default",
"label": "Team docs"
}
]
}
You can use Knowledge Bases alongside local file indexing, or on their own (omit dirs and provider for KB-only mode).
KB-only config:
{
"knowledgeBases": [
{
"id": "XXXXXXXXXX",
"region": "us-east-1",
"profile": "my-work-profile",
"label": "Engineering wiki"
}
]
}
Requires the AWS SDK and valid credentials with bedrock:Retrieve permissions.
Environment variable overrides
Every config field can be overridden via environment variables. This is useful for CI or when you want different settings per shell session. See env-vars.md for the full list.
How it works
- On session start, loads the index from disk and incrementally syncs — only re-embeds new or modified files
- Starts a file watcher for real-time updates (debounced, 2s)
- Registers a
knowledge_searchtool the LLM calls with natural language queries - Returns ranked results with file paths, relevance scores, and content excerpts
The index is stored at ~/.pi/knowledge-search/index.json.
Commands
| Command | Description |
|---|---|
/knowledge-search-setup |
Interactive setup wizard |
/knowledge-add-kb |
Add a Bedrock Knowledge Base as a search source |
/knowledge-reindex |
Force a full re-index |
Performance
Typical numbers for ~500 markdown files (~20MB):
| Operation | Time |
|---|---|
| Full index build | ~7s |
| Incremental sync (no changes) | ~12ms |
| File re-embed (watcher) | ~200ms |
| Search query | ~250ms |
| Index file size | ~5MB |
Project-local storage
By default, config lives at ~/.pi/knowledge-search.json and the index at ~/.pi/knowledge-search/. To relocate per-project, add one of the following to {project}/.pi/settings.json:
{
"pi-knowledge-search": {
"localPath": ".pi/knowledge-search" // config.json + index/ under this path
}
}
Or via the pi-total-recall cascade:
{
"pi-total-recall": {
"localPath": ".pi/total-recall"
// pi-knowledge-search → {project}/.pi/total-recall/knowledge-search/
}
}
Resolution order (highest priority first):
KNOWLEDGE_SEARCH_CONFIG/KNOWLEDGE_SEARCH_INDEX_DIRenv varspi-knowledge-search.localPathin{cwd}/.pi/settings.jsonpi-total-recall.localPathcascade →{localPath}/knowledge-search/- Global default:
~/.pi/knowledge-search.json+~/.pi/knowledge-search/
Per-project indexes are particularly useful for vault- or doc-tree-scoped embeddings where you don't want cross-project bleed.
License
MIT