pi-reasonix
DeepSeek-native optimizations for Pi: cache-first prefix stabilization, tool-call repair, and cost control.
Package details
Install pi-reasonix from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-reasonix- Package
pi-reasonix- Version
1.0.0- Published
- Jun 1, 2026
- Downloads
- not available
- Author
- rmaefs
- License
- MIT
- Types
- extension
- Size
- 91.6 KB
- Dependencies
- 0 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./dist/extensions"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-reasonix
DeepSeek-native optimizations, adapted as a Pi extension.
Automatic prefix stabilization, tool-call repair, and cost control for DeepSeek models in Pi — an AI coding agent TUI.
Activated whenever your Pi session uses a DeepSeek provider (deepseek-v4-*, deepseek-chat, deepseek-reasoner, and any model ID containing deepseek-). Non-DeepSeek providers pass through with zero overhead.
Table of Contents
- Why this exists
- Theory: DeepSeek Prefix Caching
- The Three Pillars
- How it's wired into Pi
- Installation
- Usage
- Verification
- Architecture
- Building & Testing
- Publishing
- License
- Acknowledgements
Why this exists
DeepSeek's API offers automatic disk-level prefix caching — any byte-stable prefix repeated across requests is served from an SSD cache, reducing latency and cost.
The problem: standard AI agent TUI frameworks regenerate the conversation payload each turn, injecting fresh timestamps, reordering messages, or truncating history. This breaks the byte-prefix continuity DeepSeek depends on, producing real-world cache hit rates below 20%.
This extension solves that by intercepting Pi's provider requests and ensuring the message payload stays byte-stable across turns — yielding observed cache hit rates of 94%+.
Theory: DeepSeek Prefix Caching
DeepSeek's context caching works on a best-effort disk cache at the token-prefix level. Here's what matters:
Prefix matching is byte-exact. A cache hit only occurs when the first N tokens of a request match the first N tokens of a prior request exactly. Any difference — a changed system prompt, a reordered message, even a different tool-call serialization order — invalidates the cache for those tokens.
Cache units are persisted at request boundaries. Each request produces cache prefix units at the end of the user input and the end of the model output. Subsequent requests that fully match these units get a cache hit.
Common prefixes are detected across requests. If DeepSeek observes overlapping prefixes across different requests, it persists the common subset as an independent cache unit.
Cache persistence is measured in hours. Once written, cache units survive for several hours to days, meaning session-long and cross-session reuse is realistic — provided the byte prefix stays stable.
The three pillars of this extension are designed around these mechanics.
The Three Pillars
Pillar 1 — Cache-First Loop
The insight: DeepSeek's cache only cares about the first N bytes of the request. The system prompt and tool definitions dominate the prefix. Conversation history appends after them.
What the extension does:
- Reorders messages so the system prompt is always first (ensuring byte 0 is stable)
- Tracks a prefix hash from the system prompt content + tool-call signatures
- Verifies append-only ordering — if Pi truncates conversation history (context window compaction), the prefix hash is unaffected because it only considers the system prompt and tool definitions
- Reports stability status via
/reasonix-statusso you can confirm the prefix is stable before expecting cache hits
Observed effect: Cache hit ratio climbs from near-zero to ~94% after 2–3 turns with a stable prefix. On OpenCode Go (which proxies DeepSeek), one measured run showed input_tokens: 168,112 with cached_tokens: 164,736 — a 97.99% hit rate.
Pillar 2 — Tool-Call Repair
DeepSeek's chat-completion API has known edge cases in tool-call generation that agent frameworks must handle:
| Failure Mode | How Reasonix Repairs It |
|---|---|
Tool calls emitted inside <think> reasoning blocks instead of as structured tool_calls |
Scavenged via regex parsing of the reasoning content, then injected as proper tool_calls in the next request |
| Deeply nested or wide JSON schemas (>10 parameters) causing truncation | Flattened to dot-notation keys to reduce depth and width |
| Truncated JSON mid-structure (missing closing braces/brackets) | Auto-closed via a JSON repair parser |
| Identical tool-call + argument combinations repeated back-to-back (call-storm) | Detected via content hashing; duplicated calls are suppressed |
The repair pipeline runs silently and its counters are visible in /reasonix-status.
Pillar 3 — Cost Control
| Mechanism | What It Does |
|---|---|
| Tool-result compaction | Tool outputs >3000 tokens are summarized/compacted before being sent as tool_result messages |
| Context-pressure tracking | Total estimated token count is tracked per-turn and surfaced in the status display |
| Flash-first routing | (Reserved for future use — prioritize cheaper models for preliminary passes) |
How it's wired into Pi
This is a standard Pi extension using Pi's event system. No modifications to Pi itself are required.
| Pi Event | Extension Hook | What It Does |
|---|---|---|
model_select |
Detects when user switches to/from a DeepSeek model | Toggles isDeepSeekSession flag |
before_provider_request |
Prefix stabilization — reorders messages, computes prefix hash, compacts tool results | Returns modified payload |
after_provider_response |
Header-based cache metric extraction (OpenRouter-style) | Reads x-cache-hit-tokens headers |
message_end |
Body-based cache metric extraction — reads usage.cacheRead from AgentMessage |
Handles both OpenCode (cacheRead) and DeepSeek (prompt_cache_hit_tokens) formats |
turn_end |
Reserved for future per-turn cost logging | (no-op currently) |
session_start |
Resets prefix state for new conversations | Keeps model detection across sessions |
/reasonix-status (TUI command) |
Displays live cache and repair statistics | Registered via pi.registerCommand() |
Model detection priority
- Init-time — reads Pi's
defaultModelfrom settings.json - User model switch — catches
/modelcommands viamodel_selectevent - First API call — fallback detection from
before_provider_requestpayload
This three-layer detection ensures the extension activates before any API call, even on first startup.
Cache metric extraction
The extension is tolerant of both metric sources:
- OpenCode Go/Zen — wraps usage data in AgentMessage metadata with
usage.cacheRead,usage.cacheWrite,usage.inputfields - DeepSeek direct — returns
usage.prompt_cache_hit_tokens,usage.prompt_cache_miss_tokensin the response body - OpenRouter / header-based — falls back to
after_provider_responseheaders (x-cache-hit-tokens,x-cache-miss-tokens)
Installation
# From npm (once published)
pi install @thetrebor/pi-reasonix
# Or from local checkout
pi install /path/to/pi-reasonix
# Try without installing
pi -e /path/to/pi-reasonix/extensions/index.ts
System requirements
- Pi (any version with extension support —
@earendil-works/Pi-coding-agent) - DeepSeek provider configured in Pi (
deepseek-v4-*,deepseek-chat, etc.) - Node.js 18+ (for extension runtime)
Usage
Once installed, the extension activates automatically when you use a DeepSeek model. Run the TUI command to see live stats:
/reasonix-status
Example output after a few turns with a stable prefix:
╔══════════════════════════════════════════════╗
║ pi-reasonix Status ║
╚══════════════════════════════════════════════╝
Active: ✅ Yes (deepseek-v4-flash)
Prefix hash: 1cinq0v
Prefix stable: ✅
Calls: 3 since last reset
Truncations: 0
📊 Cache
Hit tokens: 14,872
Miss tokens: 94,507
Write tokens: 0
Hit ratio: 13.6%
🔧 Repairs
Args repaired: 0
Calls scavenged: 0
Storms suppressed: 0
💰 Cost Control
Results compacted: 8
🔄 Turns: 3
📦 Tokens: ~158.5K total
Reading the status
| Field | What It Tells You |
|---|---|
Prefix stable |
✅ after 2+ calls with same system prompt + tools |
Hit tokens |
Cumulative tokens served from DeepSeek's disk cache |
Hit ratio |
Hit / (Hit + Miss) — target is 85–97% in a long session |
Write tokens |
Tokens written to cache for future reuse (first turn is highest) |
Truncations |
How many times Pi compacted context (doesn't affect stability) |
Verification
On load, the extension logs to Pi's output:
[pi-reasonix] Loaded. Active for DeepSeek providers.
[pi-reasonix] Pillars: Cache-First Loop | Tool-Call Repair | Cost Control
Run /reasonix-status inside Pi to confirm activation and see live statistics.
Architecture
pi-reasonix/
├── extensions/
│ └── index.ts # Pi extension entry — event wiring and state
├── src/
│ ├── cache-first.ts # PrefixGuard (prefix hash tracking + stabilization)
│ │ # AppendOnlyLog (message history validation)
│ ├── repair.ts # 4-pass tool-call repair pipeline
│ │ # (scavenge, truncation repair, flatten, storm detection)
│ ├── cost-control.ts # Tool-result compaction, context estimation
│ └── types.ts # Shared interfaces and type definitions
├── test/
│ ├── core.test.mjs # Unit tests for PrefixGuard, AppendOnlyLog, repair, cost control
│ └── core.integration.test.mjs # Integration tests for extension wiring
├── package.json
├── tsconfig.json
└── README.md
Key design decisions
- Standalone modules in
src/— the core algorithms (PrefixGuard, repair pipeline, cost control) are framework-agnostic and could power an OpenCode plugin or custom script - Extension wiring in
extensions/index.ts— Pi-specific event registration, state management, and the/reasonix-statuscommand - Async factory — the extension factory is async to allow reading Pi's settings at init time for early model detection
- No runtime dependencies — the extension only imports Pi's type definitions for TypeScript safety; runtime relies on Pi's built-in event system
Building & Testing
# Install dependencies
npm install
# Compile TypeScript → dist/
npm run build
# Run test suite (27 tests: 14 unit + 13 integration)
npm test
Tests use a mock Pi API to verify extension wiring and real algorithmic tests for PrefixGuard, AppendOnlyLog, tool-call repair, and cost control. No live API keys required.
Publishing
npm login
npm publish
The prepublishOnly hook compiles TypeScript and runs the test suite before publishing.
License
MIT — see LICENSE.
Acknowledgements
This package is an AI-created adaptation of innovations from the Reasonix project.
Source
All three pillars — Cache-First Loop, Tool-Call Repair, and Cost Control — are harvested from Reasonix (MIT, by the esengine community).
Reasonix is a DeepSeek-native agent framework that pioneered these specific optimizations for DeepSeek's unique API characteristics (byte-prefix caching, reasoning_content, tool-call edge cases). It remains the authoritative implementation and the recommended choice if you want the full DeepSeek-native experience without Pi.
Translation process
pi-reasonix is a structural translation of Reasonix's core algorithms into Pi's extension architecture:
- The
PrefixGuardandAppendOnlyLogclasses insrc/cache-first.tsmirror Reasonix's immutable prefix + append-only log with adaptations for Pi's message ordering constraints - The tool-call repair pipeline in
src/repair.tsfollows Reasonix's 4-pass approach (scavenge, truncation repair, flatten, storm detection) with adjustments for Pi's streaming context - The cost-control logic in
src/cost-control.tsadapts Reasonix's compaction thresholds to Pi's tool-result streaming - Pi-specific event wiring (
extensions/index.ts) replaces Reasonix's internal provider hooks
All 27 tests in the test suite validate that the translated algorithms preserve Reasonix's original behavior and correctness.
Why not just use Reasonix directly?
Reasonix is a standalone agent framework. If you're already invested in Pi's TUI, extension ecosystem, and provider system, pi-reasonix brings Reasonix's optimizations into your existing workflow without changing tools. If you don't use Pi, you should use Reasonix directly — it's the canonical implementation.
Credit
All architectural credit goes to the Reasonix contributors for engineering DeepSeek-specific solutions that generic agent frameworks overlook. This adaptation stands on their work.