pi-lcm-memory
Persistent cross-session semantic memory for Pi — a hybrid (FTS5 + vector) recall layer on top of pi-lcm.
Package details
Install pi-lcm-memory from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-lcm-memory- Package
pi-lcm-memory- Version
1.0.1- Published
- Apr 30, 2026
- Downloads
- 89/mo · 15/wk
- Author
- sharkone
- License
- MIT
- Types
- extension
- Size
- 188.2 KB
- Dependencies
- 3 dependencies · 5 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
🧠 pi-lcm-memory
Persistent, cross-session semantic memory for Pi.
Never lose context. Every session remembered, every thought retrievable —
by meaning, not just keywords. Fully local. No external APIs.
Built as an additive layer on top of pi-lcm.
✨ What it does
When you open Pi in a project you've worked in before, pi-lcm-memory:
- 📋 Briefs you with a session-start primer of recent work
- 🔍 Recalls past messages and summaries via hybrid semantic + lexical search
- ⚡ Auto-injects relevant context when you say things like "remember earlier…"
- 🔄 Indexes silently in the background — no latency on your turns
All embeddings live in the same SQLite file pi-lcm already manages. No duplication, no sync, no external services.
🏗️ Architecture
┌──────────────────────────── Pi Session ────────────────────────────┐
│ │
│ ┌─────────────┐ message_end ┌──────────────────────────────┐ │
│ │ pi-lcm │ ──────────────► │ pi-lcm-memory │ │
│ │ │ │ │ │
│ │ messages │ ◄── read-only ── │ Indexer (hook + sweep) │ │
│ │ summaries │ │ │ │ │
│ │ FTS5 index │ │ ▼ │ │
│ └─────────────┘ │ Worker thread │ │
│ │ │ (ONNX / Transformers.js) │ │
│ │ shared SQLite │ │ │ │
│ ▼ │ ▼ │ │
│ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ ~/.pi/agent/lcm/<hash>.db │ │ │
│ │ │ │ │
│ │ messages ──────────────────── memory_index (join) │ │ │
│ │ summaries ─────────────────── memory_vec (sqlite-vec) │ │ │
│ │ memory_meta (kv + events) │ │ │
│ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ session_start ───────────────► Primer + auto-recall │ │
│ user turn ────────────────────► Heuristic recall injection │ │
│ lcm_recall / lcm_similar ──────► Retriever (FTS5 + vec → RRF) │ │
│ │ │
└────────────────────────────────────────────────────────────────────┘
Both extensions are independent Pi peers — pi-lcm-memory never patches pi-lcm. It only adds three tables (memory_vec, memory_index, memory_meta) to the existing per-project SQLite.
🚀 Quick start
pi install npm:pi-lcm # if not already installed
pi install npm:pi-lcm-memory
pi # open Pi as normal
First session in a project with existing pi-lcm history:
- ⬇️ Downloads the embedding model (
Xenova/bge-small-en-v1.5, ~33 MB, once per machine) - ⚙️ Backfills embeddings for all existing messages + summaries in batches of 32
- 📋 Renders a session-start primer with recent topics
- 🔄 From now on, every new message is embedded in the background
🆚 What it adds on top of pi-lcm
| pi-lcm | pi-lcm-memory | |
|---|---|---|
| Per-message storage | ✅ SQLite | shared (no duplication) |
| FTS5 lexical search | ✅ lcm_grep |
reused |
| DAG summaries (D0/D1/D2…) | ✅ | reused |
| Cross-session recall within project | ✅ | reused |
| Dense vector index | ❌ | ✅ sqlite-vec virtual table |
| Hybrid semantic + lexical retrieval | ❌ | ✅ lcm_recall |
| "More like this" navigation | ❌ | ✅ lcm_similar |
| Session-start memory primer | ❌ | ✅ |
| Heuristic auto-recall | ❌ | ✅ |
| Settings panel | ✅ | ✅ (mirrors pi-lcm UX) |
🛠️ Agent tools
lcm_recall
Hybrid (FTS5 + vector) search across all sessions in this project.
lcm_recall(query, k?, mode?, sessionFilter?, after?, before?)
| param | default | description |
|---|---|---|
query |
— | Natural-language or keyword query |
k |
10 |
Number of results |
mode |
hybrid |
hybrid · lexical · semantic |
sessionFilter |
— | Restrict to a single conversation UUID |
after / before |
— | ISO 8601 date bounds |
lcm_similar
Find messages semantically close to a known one — great for "show me more like this".
lcm_similar(messageId, k?)
💡 Use
lcm_grepfor exact strings,lcm_recallfor concepts and paraphrases,lcm_expand(summary_id)to drill into any summary returned by recall.
💬 Slash commands
/memory stats counts, model, dimensions, DB size
/memory status sweep cycles, busy flag, last error, current interval
/memory search <query> ad-hoc recall (same as lcm_recall)
/memory reindex wipe all embeddings and re-embed everything
/memory settings open interactive settings panel
Embedding model and hyperparameters (rrfK, lexMult, semMult) are changed via /memory settings.
⚙️ Settings
Stored under the lcm-memory key in pi-lcm's settings files.
Resolution order: env vars → project → global → defaults.
| Key | Default | Description |
|---|---|---|
enabled |
true |
Master switch. Auto-disables if pi-lcm is disabled. |
embeddingModel |
Xenova/bge-small-en-v1.5 |
Any Transformers.js feature-extraction model. |
embeddingQuantize |
q8 |
auto / fp32 / fp16 / q8 / int8 / q4 |
indexMessages |
true |
Embed user/assistant turns. |
indexSummaries |
true |
Embed pi-lcm DAG summaries. |
skipToolIO |
true |
Skip tool call/result content (FTS5 still covers these). |
primer |
true |
Show session-start briefing. |
primerTopK |
5 |
Number of recent topics in the primer. |
autoRecall |
heuristic |
off / heuristic / always |
autoRecallTopK |
5 |
Hits injected on auto-recall. |
autoRecallTokenBudget |
600 |
Hard token cap on injected recall block. |
recallDefaultTopK |
10 |
Default k for lcm_recall. |
rrfK |
20 |
Reciprocal Rank Fusion constant (sweep-tuned). |
lexMult |
4 |
FTS5 candidate breadth multiplier (sweep-tuned). |
semMult |
16 |
Vector candidate breadth multiplier (sweep-tuned). |
sweepIntervalMs |
30000 |
Base sweep period (backs off ×2 up to 5 min on idle). |
modelCacheDir |
null |
Override model weight cache directory. |
debugMode |
false |
Verbose notifications. |
Env overrides: PI_LCM_MEMORY_ENABLED, PI_LCM_MEMORY_DB_DIR, PI_LCM_MEMORY_MODEL,
PI_LCM_MEMORY_QUANTIZE, PI_LCM_MEMORY_SWEEP_MS, PI_LCM_MEMORY_DEBUG
⚡ Performance
Measured on Apple Silicon (M-class), default model Xenova/bge-small-en-v1.5 q8, 8 ORT threads:
| Metric | Value |
|---|---|
| Backfill throughput | ~1 500–2 000 messages/sec |
| Hook latency (p50) | ~3.4 ms |
| Sweep throughput | ~262 rows/sec |
| Recall latency | ~12 ms |
| Model download (once) | ~33 MB |
| DB growth per message | ~2 KB at 384 dims |
| 100k messages | ≈ 80 MB index |
All embedding work runs in a dedicated worker thread — the Pi TUI is never blocked. The main thread is idle between turns.
🔬 How it works
Ingestion — two concurrent paths keep the index fresh:
- Hook path:
message_end→ embed in worker →INSERT OR IGNORE - Sweep path: every 30 s (adaptive backoff), scan for un-indexed pi-lcm rows, process in batches of 32
- Hook path:
Retrieval —
lcm_recall(query):- Run FTS5 BM25 over
messages+summaries→ ranked list - Run sqlite-vec kNN over
memory_vec→ ranked list - Merge with Reciprocal Rank Fusion (RRF, k=60)
- Run FTS5 BM25 over
Primer — at session start, render up to 5 recent D≥1 summaries into a
## Project memoryblock (≤300 tokens). Shows a one-line notification to the user ([memory] N prior sessions; last on DATE) and injects the full block into Claude's context on the first turnAuto-recall — a regex listener on each user turn (
/remember|earlier|previously|like last time|.../i) injects a## Recallblock into the current turn's system contextWorker thread —
src/embeddings/worker.mjsowns the Transformers.js pipeline. ORT is configured withintraOpNumThreads = cpus()-1(max 8), zero-copyArrayBuffertransfers back to main thread
🐛 Debugging
Set PI_LCM_MEMORY_TRACE=1 before launching Pi to write a side-channel trace log:
PI_LCM_MEMORY_TRACE=1 pi
# → /tmp/pi-lcm-memory.<pid>.trace.log
PI_LCM_MEMORY_TRACE=/path/to/log pi # explicit path
Both the main thread and the embedder worker write to the same file with pid/src markers. The log is written with fs.writeSync so it survives main-thread freezes — it's the right tool when the TUI hangs and the in-DB diagnostics ring can't be written.
🧑💻 Local dev
git clone git@github.com:sharkone/pi-lcm-memory.git
cd pi-lcm-memory
npm install
npm test # 91 vitest tests, ~500 ms
npm run typecheck # tsc --noEmit
npm run bench # perf + quality benchmarks (needs a live pi-lcm DB)
pi -e ./index.ts # load local extension into Pi
⚠️
test/worker.live.test.tsdownloads ~33 MB of model weights. It is skipped by default — enable withPI_LCM_MEMORY_LIVE_TEST=1.
📄 License
MIT © sharkone