@rohaquinlop/pi-deepseek-cache

DeepSeek prefix cache optimization for pi — date/CWD freeze, hit-rate telemetry, cache-friendly compaction, and TUI overlays

Packages

Package details

extension

Install @rohaquinlop/pi-deepseek-cache from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@rohaquinlop/pi-deepseek-cache
Package
@rohaquinlop/pi-deepseek-cache
Version
1.3.1
Published
Jun 21, 2026
Downloads
not available
Author
rohaquinlop
License
MIT
Types
extension
Size
39.3 KB
Dependencies
0 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions/index.ts"
  ],
  "appliesToModels": [
    "deepseek-*",
    "deepseek"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-deepseek-cache

Reduce DeepSeek API costs by 95%+ through multi-layered prefix cache optimization. Zero configuration — auto-detects DeepSeek models and applies best practices transparently.

The Problem

DeepSeek's API uses prefix caching — identical prompt prefixes served from disk cache at 50–120× lower cost than fresh computation. But the cache only works when every byte from position 0 is identical across requests.

Pi's default system prompt embeds Current date: YYYY-MM-DD and Current working directory: <cwd> — dynamic values that change daily and per session, silently busting the entire prefix cache.

What This Extension Does

Layer Feature Impact
P0 Date & CWD freeze Root-cause fix — locks session date and directory, preventing daily/per-session cache bust
P1 Hit-rate telemetry Per-session hit rate shown as dimmed footer status line; /cache-stats & /cache-graph for detail
P2 Prefix guard SHA-256 hash diagnostics — tracks prefix breaks (viewable in /cache-stats)
P3 Cache-friendly compaction Deterministic summarization via deepseek-v4-flash at temperature 0, SHA-256 cached for stable replays
P4 TUI overlays /cache-stats popup with hit rate, tokens, cost savings. /cache-graph ASCII trend chart

Cost Impact

Without Extension With Extension
deepseek-v4-flash input $0.14/M tokens $0.003/M tokens (98% less)
deepseek-v4-pro input $3.00/M tokens $0.025/M tokens (99% less)

Installation

pi install npm:@rohaquinlop/pi-deepseek-cache

Or via git:

pi install git:github.com/rohaquinlop/pi-deepseek-cache

The extension activates automatically. No configuration needed. The per-session cache hit rate appears as a dimmed status line (Cache 96.2%) in Pi's footer. Pi's native CH:XX.X% shows the per-turn rate in the stats line. Detailed stats are available via /cache-stats and /cache-graph commands.

Each pi session writes its own stats-{sessionId}.json and history-{sessionId}.json files, so concurrent sessions never race on the same file. Session files older than 30 days are cleaned up automatically.

Provider Support

Works with any provider serving DeepSeek models:

  • Any provider with deepseek-* model IDs (NaN Builders, OpenRouter, custom proxies, etc.)
  • DeepSeek API (deepseek provider) — direct API users

Non-DeepSeek models pass through unchanged.

Subagent Compatibility

This extension automatically applies to subagent processes that use DeepSeek models. It declares appliesToModels: ["deepseek-*", "deepseek"] in its package.json, which the pi-subagents extension detects and loads into child processes — no configuration needed.

For the best cache performance, ensure both extensions are installed:

pi install npm:@rohaquinlop/pi-subagents
pi install npm:@rohaquinlop/pi-deepseek-cache

Commands

/cache-stats

Overlay popup showing two sections: this session's stats and an aggregate across all sessions (N sessions). Each section shows hit rate, cache read/write/input tokens, turns, and estimated cost savings.

/cache-graph

ASCII trend chart of hit rate over turns — helps spot regressions.

/cache-reset

Clears all cached statistics, history, and summary cache — deletes all per-session stats-*.json and history-*.json files plus the summary cache. Useful after major prompt changes.

How It Works

P0 (Date/CWD freeze): On before_agent_start, replaces the dynamic Current date and Current working directory lines with values frozen at session start. The system prompt prefix stays byte-identical across the entire session.

P1 (Telemetry): Accumulates cacheRead, input, cacheWrite, and turns from every assistant message's usage data. Each session stores its stats in stats-{sessionId}.json so concurrent sessions never race. /cache-stats shows both this session's stats and an aggregate across all sessions.

P2 (Prefix guard): On before_provider_request, SHA-256 hashes all messages except the last to fingerprint the prefix. Tracks when the hash changes — the break count is visible in /cache-stats.

P3 (Compaction): On session_before_compact, summarizes conversation history with deepseek-v4-flash at temperature 0. Summaries are SHA-256 hashed and cached — identical histories produce byte-identical summaries, keeping compaction cache-stable.

P4 (Overlays): /cache-stats and /cache-graph render as TUI overlay popups (Esc to dismiss) with formatted hit-rate data and ASCII trend charts.

License

MIT