pi-lcm-memory

Persistent cross-session semantic memory for Pi — a hybrid (FTS5 + vector) recall layer on top of pi-lcm.

Packages

Package details

extension

Install pi-lcm-memory from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-lcm-memory
Package
pi-lcm-memory
Version
1.0.1
Published
Apr 30, 2026
Downloads
89/mo · 15/wk
Author
sharkone
License
MIT
Types
extension
Size
188.2 KB
Dependencies
3 dependencies · 5 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

🧠 pi-lcm-memory

Persistent, cross-session semantic memory for Pi.
Never lose context. Every session remembered, every thought retrievable —
by meaning, not just keywords. Fully local. No external APIs.

Built as an additive layer on top of pi-lcm.


✨ What it does

When you open Pi in a project you've worked in before, pi-lcm-memory:

  • 📋 Briefs you with a session-start primer of recent work
  • 🔍 Recalls past messages and summaries via hybrid semantic + lexical search
  • Auto-injects relevant context when you say things like "remember earlier…"
  • 🔄 Indexes silently in the background — no latency on your turns

All embeddings live in the same SQLite file pi-lcm already manages. No duplication, no sync, no external services.


🏗️ Architecture

┌──────────────────────────── Pi Session ────────────────────────────┐
│                                                                     │
│  ┌─────────────┐   message_end    ┌──────────────────────────────┐  │
│  │   pi-lcm    │ ──────────────►  │      pi-lcm-memory           │  │
│  │             │                  │                              │  │
│  │  messages   │ ◄── read-only ── │  Indexer (hook + sweep)      │  │
│  │  summaries  │                  │     │                        │  │
│  │  FTS5 index │                  │     ▼                        │  │
│  └─────────────┘                  │  Worker thread               │  │
│        │                          │  (ONNX / Transformers.js)    │  │
│        │  shared SQLite           │     │                        │  │
│        ▼                          │     ▼                        │  │
│  ┌─────────────────────────────────────────────────────────────┐  │  │
│  │  ~/.pi/agent/lcm/<hash>.db                                  │  │  │
│  │                                                             │  │  │
│  │  messages ──────────────────── memory_index (join)         │  │  │
│  │  summaries ─────────────────── memory_vec   (sqlite-vec)   │  │  │
│  │                                memory_meta  (kv + events)  │  │  │
│  └─────────────────────────────────────────────────────────────┘  │  │
│                                   │                               │  │
│  session_start ───────────────►   Primer + auto-recall            │  │
│  user turn ────────────────────►  Heuristic recall injection      │  │
│  lcm_recall / lcm_similar ──────► Retriever (FTS5 + vec → RRF)   │  │
│                                                                    │  │
└────────────────────────────────────────────────────────────────────┘

Both extensions are independent Pi peers — pi-lcm-memory never patches pi-lcm. It only adds three tables (memory_vec, memory_index, memory_meta) to the existing per-project SQLite.


🚀 Quick start

pi install npm:pi-lcm           # if not already installed
pi install npm:pi-lcm-memory

pi                              # open Pi as normal

First session in a project with existing pi-lcm history:

  1. ⬇️ Downloads the embedding model (Xenova/bge-small-en-v1.5, ~33 MB, once per machine)
  2. ⚙️ Backfills embeddings for all existing messages + summaries in batches of 32
  3. 📋 Renders a session-start primer with recent topics
  4. 🔄 From now on, every new message is embedded in the background

🆚 What it adds on top of pi-lcm

pi-lcm pi-lcm-memory
Per-message storage ✅ SQLite shared (no duplication)
FTS5 lexical search lcm_grep reused
DAG summaries (D0/D1/D2…) reused
Cross-session recall within project reused
Dense vector index sqlite-vec virtual table
Hybrid semantic + lexical retrieval lcm_recall
"More like this" navigation lcm_similar
Session-start memory primer
Heuristic auto-recall
Settings panel ✅ (mirrors pi-lcm UX)

🛠️ Agent tools

lcm_recall

Hybrid (FTS5 + vector) search across all sessions in this project.

lcm_recall(query, k?, mode?, sessionFilter?, after?, before?)
param default description
query Natural-language or keyword query
k 10 Number of results
mode hybrid hybrid · lexical · semantic
sessionFilter Restrict to a single conversation UUID
after / before ISO 8601 date bounds

lcm_similar

Find messages semantically close to a known one — great for "show me more like this".

lcm_similar(messageId, k?)

💡 Use lcm_grep for exact strings, lcm_recall for concepts and paraphrases, lcm_expand(summary_id) to drill into any summary returned by recall.


💬 Slash commands

/memory stats               counts, model, dimensions, DB size
/memory status              sweep cycles, busy flag, last error, current interval
/memory search <query>      ad-hoc recall (same as lcm_recall)
/memory reindex             wipe all embeddings and re-embed everything
/memory settings            open interactive settings panel

Embedding model and hyperparameters (rrfK, lexMult, semMult) are changed via /memory settings.


⚙️ Settings

Stored under the lcm-memory key in pi-lcm's settings files.
Resolution order: env vars → project → global → defaults.

Key Default Description
enabled true Master switch. Auto-disables if pi-lcm is disabled.
embeddingModel Xenova/bge-small-en-v1.5 Any Transformers.js feature-extraction model.
embeddingQuantize q8 auto / fp32 / fp16 / q8 / int8 / q4
indexMessages true Embed user/assistant turns.
indexSummaries true Embed pi-lcm DAG summaries.
skipToolIO true Skip tool call/result content (FTS5 still covers these).
primer true Show session-start briefing.
primerTopK 5 Number of recent topics in the primer.
autoRecall heuristic off / heuristic / always
autoRecallTopK 5 Hits injected on auto-recall.
autoRecallTokenBudget 600 Hard token cap on injected recall block.
recallDefaultTopK 10 Default k for lcm_recall.
rrfK 20 Reciprocal Rank Fusion constant (sweep-tuned).
lexMult 4 FTS5 candidate breadth multiplier (sweep-tuned).
semMult 16 Vector candidate breadth multiplier (sweep-tuned).
sweepIntervalMs 30000 Base sweep period (backs off ×2 up to 5 min on idle).
modelCacheDir null Override model weight cache directory.
debugMode false Verbose notifications.

Env overrides: PI_LCM_MEMORY_ENABLED, PI_LCM_MEMORY_DB_DIR, PI_LCM_MEMORY_MODEL, PI_LCM_MEMORY_QUANTIZE, PI_LCM_MEMORY_SWEEP_MS, PI_LCM_MEMORY_DEBUG


⚡ Performance

Measured on Apple Silicon (M-class), default model Xenova/bge-small-en-v1.5 q8, 8 ORT threads:

Metric Value
Backfill throughput ~1 500–2 000 messages/sec
Hook latency (p50) ~3.4 ms
Sweep throughput ~262 rows/sec
Recall latency ~12 ms
Model download (once) ~33 MB
DB growth per message ~2 KB at 384 dims
100k messages ≈ 80 MB index

All embedding work runs in a dedicated worker thread — the Pi TUI is never blocked. The main thread is idle between turns.


🔬 How it works

  1. Ingestion — two concurrent paths keep the index fresh:

    • Hook path: message_end → embed in worker → INSERT OR IGNORE
    • Sweep path: every 30 s (adaptive backoff), scan for un-indexed pi-lcm rows, process in batches of 32
  2. Retrievallcm_recall(query):

    • Run FTS5 BM25 over messages + summaries → ranked list
    • Run sqlite-vec kNN over memory_vec → ranked list
    • Merge with Reciprocal Rank Fusion (RRF, k=60)
  3. Primer — at session start, render up to 5 recent D≥1 summaries into a ## Project memory block (≤300 tokens). Shows a one-line notification to the user ([memory] N prior sessions; last on DATE) and injects the full block into Claude's context on the first turn

  4. Auto-recall — a regex listener on each user turn (/remember|earlier|previously|like last time|.../i) injects a ## Recall block into the current turn's system context

  5. Worker threadsrc/embeddings/worker.mjs owns the Transformers.js pipeline. ORT is configured with intraOpNumThreads = cpus()-1 (max 8), zero-copy ArrayBuffer transfers back to main thread


🐛 Debugging

Set PI_LCM_MEMORY_TRACE=1 before launching Pi to write a side-channel trace log:

PI_LCM_MEMORY_TRACE=1 pi
# → /tmp/pi-lcm-memory.<pid>.trace.log

PI_LCM_MEMORY_TRACE=/path/to/log pi   # explicit path

Both the main thread and the embedder worker write to the same file with pid/src markers. The log is written with fs.writeSync so it survives main-thread freezes — it's the right tool when the TUI hangs and the in-DB diagnostics ring can't be written.


🧑‍💻 Local dev

git clone git@github.com:sharkone/pi-lcm-memory.git
cd pi-lcm-memory
npm install

npm test              # 91 vitest tests, ~500 ms
npm run typecheck     # tsc --noEmit
npm run bench         # perf + quality benchmarks (needs a live pi-lcm DB)

pi -e ./index.ts      # load local extension into Pi

⚠️ test/worker.live.test.ts downloads ~33 MB of model weights. It is skipped by default — enable with PI_LCM_MEMORY_LIVE_TEST=1.


📄 License

MIT © sharkone