pi-lcm-memory

Persistent cross-session semantic memory for Pi β€” a hybrid (FTS5 + vector) recall layer on top of pi-lcm.

Package details

extension

Install pi-lcm-memory from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-lcm-memory
Package
pi-lcm-memory
Version
1.0.1
Published
Apr 30, 2026
Downloads
not available
Author
sharkone
License
MIT
Types
extension
Size
188.2 KB
Dependencies
3 dependencies Β· 5 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

🧠 pi-lcm-memory

Persistent, cross-session semantic memory for Pi.
Never lose context. Every session remembered, every thought retrievable β€”
by meaning, not just keywords. Fully local. No external APIs.

Built as an additive layer on top of pi-lcm.


✨ What it does

When you open Pi in a project you've worked in before, pi-lcm-memory:

  • πŸ“‹ Briefs you with a session-start primer of recent work
  • πŸ” Recalls past messages and summaries via hybrid semantic + lexical search
  • ⚑ Auto-injects relevant context when you say things like "remember earlier…"
  • πŸ”„ Indexes silently in the background β€” no latency on your turns

All embeddings live in the same SQLite file pi-lcm already manages. No duplication, no sync, no external services.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ Pi Session ────────────────────────────┐
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   message_end    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   pi-lcm    β”‚ ──────────────►  β”‚      pi-lcm-memory           β”‚  β”‚
β”‚  β”‚             β”‚                  β”‚                              β”‚  β”‚
β”‚  β”‚  messages   β”‚ ◄── read-only ── β”‚  Indexer (hook + sweep)      β”‚  β”‚
β”‚  β”‚  summaries  β”‚                  β”‚     β”‚                        β”‚  β”‚
β”‚  β”‚  FTS5 index β”‚                  β”‚     β–Ό                        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚  Worker thread               β”‚  β”‚
β”‚        β”‚                          β”‚  (ONNX / Transformers.js)    β”‚  β”‚
β”‚        β”‚  shared SQLite           β”‚     β”‚                        β”‚  β”‚
β”‚        β–Ό                          β”‚     β–Ό                        β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  ~/.pi/agent/lcm/<hash>.db                                  β”‚  β”‚  β”‚
β”‚  β”‚                                                             β”‚  β”‚  β”‚
β”‚  β”‚  messages ──────────────────── memory_index (join)         β”‚  β”‚  β”‚
β”‚  β”‚  summaries ─────────────────── memory_vec   (sqlite-vec)   β”‚  β”‚  β”‚
β”‚  β”‚                                memory_meta  (kv + events)  β”‚  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚                                   β”‚                               β”‚  β”‚
β”‚  session_start ───────────────►   Primer + auto-recall            β”‚  β”‚
β”‚  user turn ────────────────────►  Heuristic recall injection      β”‚  β”‚
β”‚  lcm_recall / lcm_similar ──────► Retriever (FTS5 + vec β†’ RRF)   β”‚  β”‚
β”‚                                                                    β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Both extensions are independent Pi peers β€” pi-lcm-memory never patches pi-lcm. It only adds three tables (memory_vec, memory_index, memory_meta) to the existing per-project SQLite.


πŸš€ Quick start

pi install npm:pi-lcm           # if not already installed
pi install npm:pi-lcm-memory

pi                              # open Pi as normal

First session in a project with existing pi-lcm history:

  1. ⬇️ Downloads the embedding model (Xenova/bge-small-en-v1.5, ~33 MB, once per machine)
  2. βš™οΈ Backfills embeddings for all existing messages + summaries in batches of 32
  3. πŸ“‹ Renders a session-start primer with recent topics
  4. πŸ”„ From now on, every new message is embedded in the background

πŸ†š What it adds on top of pi-lcm

pi-lcm pi-lcm-memory
Per-message storage βœ… SQLite shared (no duplication)
FTS5 lexical search βœ… lcm_grep reused
DAG summaries (D0/D1/D2…) βœ… reused
Cross-session recall within project βœ… reused
Dense vector index ❌ βœ… sqlite-vec virtual table
Hybrid semantic + lexical retrieval ❌ βœ… lcm_recall
"More like this" navigation ❌ βœ… lcm_similar
Session-start memory primer ❌ βœ…
Heuristic auto-recall ❌ βœ…
Settings panel βœ… βœ… (mirrors pi-lcm UX)

πŸ› οΈ Agent tools

lcm_recall

Hybrid (FTS5 + vector) search across all sessions in this project.

lcm_recall(query, k?, mode?, sessionFilter?, after?, before?)
param default description
query β€” Natural-language or keyword query
k 10 Number of results
mode hybrid hybrid Β· lexical Β· semantic
sessionFilter β€” Restrict to a single conversation UUID
after / before β€” ISO 8601 date bounds

lcm_similar

Find messages semantically close to a known one β€” great for "show me more like this".

lcm_similar(messageId, k?)

πŸ’‘ Use lcm_grep for exact strings, lcm_recall for concepts and paraphrases, lcm_expand(summary_id) to drill into any summary returned by recall.


πŸ’¬ Slash commands

/memory stats               counts, model, dimensions, DB size
/memory status              sweep cycles, busy flag, last error, current interval
/memory search <query>      ad-hoc recall (same as lcm_recall)
/memory reindex             wipe all embeddings and re-embed everything
/memory settings            open interactive settings panel

Embedding model and hyperparameters (rrfK, lexMult, semMult) are changed via /memory settings.


βš™οΈ Settings

Stored under the lcm-memory key in pi-lcm's settings files.
Resolution order: env vars β†’ project β†’ global β†’ defaults.

Key Default Description
enabled true Master switch. Auto-disables if pi-lcm is disabled.
embeddingModel Xenova/bge-small-en-v1.5 Any Transformers.js feature-extraction model.
embeddingQuantize q8 auto / fp32 / fp16 / q8 / int8 / q4
indexMessages true Embed user/assistant turns.
indexSummaries true Embed pi-lcm DAG summaries.
skipToolIO true Skip tool call/result content (FTS5 still covers these).
primer true Show session-start briefing.
primerTopK 5 Number of recent topics in the primer.
autoRecall heuristic off / heuristic / always
autoRecallTopK 5 Hits injected on auto-recall.
autoRecallTokenBudget 600 Hard token cap on injected recall block.
recallDefaultTopK 10 Default k for lcm_recall.
rrfK 20 Reciprocal Rank Fusion constant (sweep-tuned).
lexMult 4 FTS5 candidate breadth multiplier (sweep-tuned).
semMult 16 Vector candidate breadth multiplier (sweep-tuned).
sweepIntervalMs 30000 Base sweep period (backs off Γ—2 up to 5 min on idle).
modelCacheDir null Override model weight cache directory.
debugMode false Verbose notifications.

Env overrides: PI_LCM_MEMORY_ENABLED, PI_LCM_MEMORY_DB_DIR, PI_LCM_MEMORY_MODEL, PI_LCM_MEMORY_QUANTIZE, PI_LCM_MEMORY_SWEEP_MS, PI_LCM_MEMORY_DEBUG


⚑ Performance

Measured on Apple Silicon (M-class), default model Xenova/bge-small-en-v1.5 q8, 8 ORT threads:

Metric Value
Backfill throughput ~1 500–2 000 messages/sec
Hook latency (p50) ~3.4 ms
Sweep throughput ~262 rows/sec
Recall latency ~12 ms
Model download (once) ~33 MB
DB growth per message ~2 KB at 384 dims
100k messages β‰ˆ 80 MB index

All embedding work runs in a dedicated worker thread β€” the Pi TUI is never blocked. The main thread is idle between turns.


πŸ”¬ How it works

  1. Ingestion β€” two concurrent paths keep the index fresh:

    • Hook path: message_end β†’ embed in worker β†’ INSERT OR IGNORE
    • Sweep path: every 30 s (adaptive backoff), scan for un-indexed pi-lcm rows, process in batches of 32
  2. Retrieval β€” lcm_recall(query):

    • Run FTS5 BM25 over messages + summaries β†’ ranked list
    • Run sqlite-vec kNN over memory_vec β†’ ranked list
    • Merge with Reciprocal Rank Fusion (RRF, k=60)
  3. Primer β€” at session start, render up to 5 recent Dβ‰₯1 summaries into a ## Project memory block (≀300 tokens). Shows a one-line notification to the user ([memory] N prior sessions; last on DATE) and injects the full block into Claude's context on the first turn

  4. Auto-recall β€” a regex listener on each user turn (/remember|earlier|previously|like last time|.../i) injects a ## Recall block into the current turn's system context

  5. Worker thread β€” src/embeddings/worker.mjs owns the Transformers.js pipeline. ORT is configured with intraOpNumThreads = cpus()-1 (max 8), zero-copy ArrayBuffer transfers back to main thread


πŸ› Debugging

Set PI_LCM_MEMORY_TRACE=1 before launching Pi to write a side-channel trace log:

PI_LCM_MEMORY_TRACE=1 pi
# β†’ /tmp/pi-lcm-memory.<pid>.trace.log

PI_LCM_MEMORY_TRACE=/path/to/log pi   # explicit path

Both the main thread and the embedder worker write to the same file with pid/src markers. The log is written with fs.writeSync so it survives main-thread freezes β€” it's the right tool when the TUI hangs and the in-DB diagnostics ring can't be written.


πŸ§‘β€πŸ’» Local dev

git clone git@github.com:sharkone/pi-lcm-memory.git
cd pi-lcm-memory
npm install

npm test              # 91 vitest tests, ~500 ms
npm run typecheck     # tsc --noEmit
npm run bench         # perf + quality benchmarks (needs a live pi-lcm DB)

pi -e ./index.ts      # load local extension into Pi

⚠️ test/worker.live.test.ts downloads ~33 MB of model weights. It is skipped by default β€” enable with PI_LCM_MEMORY_LIVE_TEST=1.


πŸ“„ License

MIT Β© sharkone