pi-reasonix

DeepSeek-native optimizations for Pi: cache-first prefix stabilization, tool-call repair, and cost control.

Packages

Package details

extension

Install pi-reasonix from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-reasonix

Package: pi-reasonix
Version: 1.0.0
Published: Jun 1, 2026
Downloads: not available
Author: rmaefs
License: MIT
Types: extension
Size: 91.6 KB
Dependencies: 0 dependencies · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./dist/extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-reasonix

DeepSeek-native optimizations, adapted as a Pi extension.

Automatic prefix stabilization, tool-call repair, and cost control for DeepSeek models in Pi — an AI coding agent TUI.

Activated whenever your Pi session uses a DeepSeek provider (deepseek-v4-*, deepseek-chat, deepseek-reasoner, and any model ID containing deepseek-). Non-DeepSeek providers pass through with zero overhead.

Why this exists
Theory: DeepSeek Prefix Caching
The Three Pillars
How it's wired into Pi
Installation
Usage
Verification
Architecture
Building & Testing
Publishing
License
Acknowledgements

Why this exists

DeepSeek's API offers automatic disk-level prefix caching — any byte-stable prefix repeated across requests is served from an SSD cache, reducing latency and cost.

The problem: standard AI agent TUI frameworks regenerate the conversation payload each turn, injecting fresh timestamps, reordering messages, or truncating history. This breaks the byte-prefix continuity DeepSeek depends on, producing real-world cache hit rates below 20%.

This extension solves that by intercepting Pi's provider requests and ensuring the message payload stays byte-stable across turns — yielding observed cache hit rates of 94%+.

Theory: DeepSeek Prefix Caching

DeepSeek's context caching works on a best-effort disk cache at the token-prefix level. Here's what matters:

Prefix matching is byte-exact. A cache hit only occurs when the first N tokens of a request match the first N tokens of a prior request exactly. Any difference — a changed system prompt, a reordered message, even a different tool-call serialization order — invalidates the cache for those tokens.
Cache units are persisted at request boundaries. Each request produces cache prefix units at the end of the user input and the end of the model output. Subsequent requests that fully match these units get a cache hit.
Common prefixes are detected across requests. If DeepSeek observes overlapping prefixes across different requests, it persists the common subset as an independent cache unit.
Cache persistence is measured in hours. Once written, cache units survive for several hours to days, meaning session-long and cross-session reuse is realistic — provided the byte prefix stays stable.

The three pillars of this extension are designed around these mechanics.

The Three Pillars

Pillar 1 — Cache-First Loop

The insight: DeepSeek's cache only cares about the first N bytes of the request. The system prompt and tool definitions dominate the prefix. Conversation history appends after them.

What the extension does:

Reorders messages so the system prompt is always first (ensuring byte 0 is stable)
Tracks a prefix hash from the system prompt content + tool-call signatures
Verifies append-only ordering — if Pi truncates conversation history (context window compaction), the prefix hash is unaffected because it only considers the system prompt and tool definitions
Reports stability status via /reasonix-status so you can confirm the prefix is stable before expecting cache hits

Observed effect: Cache hit ratio climbs from near-zero to ~94% after 2–3 turns with a stable prefix. On OpenCode Go (which proxies DeepSeek), one measured run showed input_tokens: 168,112 with cached_tokens: 164,736 — a 97.99% hit rate.

Pillar 2 — Tool-Call Repair

DeepSeek's chat-completion API has known edge cases in tool-call generation that agent frameworks must handle:

Failure Mode	How Reasonix Repairs It
Tool calls emitted inside `<think>` reasoning blocks instead of as structured tool_calls	Scavenged via regex parsing of the reasoning content, then injected as proper tool_calls in the next request
Deeply nested or wide JSON schemas (>10 parameters) causing truncation	Flattened to dot-notation keys to reduce depth and width
Truncated JSON mid-structure (missing closing braces/brackets)	Auto-closed via a JSON repair parser
Identical tool-call + argument combinations repeated back-to-back (call-storm)	Detected via content hashing; duplicated calls are suppressed

The repair pipeline runs silently and its counters are visible in /reasonix-status.

Pillar 3 — Cost Control

Mechanism	What It Does
Tool-result compaction	Tool outputs >3000 tokens are summarized/compacted before being sent as `tool_result` messages
Context-pressure tracking	Total estimated token count is tracked per-turn and surfaced in the status display
Flash-first routing	(Reserved for future use — prioritize cheaper models for preliminary passes)

How it's wired into Pi

This is a standard Pi extension using Pi's event system. No modifications to Pi itself are required.

Pi Event	Extension Hook	What It Does
`model_select`	Detects when user switches to/from a DeepSeek model	Toggles `isDeepSeekSession` flag
`before_provider_request`	Prefix stabilization — reorders messages, computes prefix hash, compacts tool results	Returns modified payload
`after_provider_response`	Header-based cache metric extraction (OpenRouter-style)	Reads `x-cache-hit-tokens` headers
`message_end`	Body-based cache metric extraction — reads `usage.cacheRead` from AgentMessage	Handles both OpenCode (`cacheRead`) and DeepSeek (`prompt_cache_hit_tokens`) formats
`turn_end`	Reserved for future per-turn cost logging	(no-op currently)
`session_start`	Resets prefix state for new conversations	Keeps model detection across sessions
`/reasonix-status` (TUI command)	Displays live cache and repair statistics	Registered via `pi.registerCommand()`

Model detection priority

Init-time — reads Pi's defaultModel from settings.json
User model switch — catches /model commands via model_select event
First API call — fallback detection from before_provider_request payload

This three-layer detection ensures the extension activates before any API call, even on first startup.

Cache metric extraction

The extension is tolerant of both metric sources:

OpenCode Go/Zen — wraps usage data in AgentMessage metadata with usage.cacheRead, usage.cacheWrite, usage.input fields
DeepSeek direct — returns usage.prompt_cache_hit_tokens, usage.prompt_cache_miss_tokens in the response body
OpenRouter / header-based — falls back to after_provider_response headers (x-cache-hit-tokens, x-cache-miss-tokens)

Installation

# From npm (once published)
pi install @thetrebor/pi-reasonix

# Or from local checkout
pi install /path/to/pi-reasonix

# Try without installing
pi -e /path/to/pi-reasonix/extensions/index.ts

System requirements

Pi (any version with extension support — @earendil-works/Pi-coding-agent)
DeepSeek provider configured in Pi (deepseek-v4-*, deepseek-chat, etc.)
Node.js 18+ (for extension runtime)

Usage

Once installed, the extension activates automatically when you use a DeepSeek model. Run the TUI command to see live stats:

/reasonix-status

Example output after a few turns with a stable prefix:

╔══════════════════════════════════════════════╗
║            pi-reasonix Status                ║
╚══════════════════════════════════════════════╝

  Active:        ✅ Yes (deepseek-v4-flash)
  Prefix hash:   1cinq0v
  Prefix stable: ✅
  Calls:         3 since last reset
  Truncations:   0

  📊 Cache
    Hit tokens:   14,872
    Miss tokens:  94,507
    Write tokens: 0
    Hit ratio:    13.6%

  🔧 Repairs
    Args repaired:     0
    Calls scavenged:   0
    Storms suppressed: 0

  💰 Cost Control
    Results compacted: 8

  🔄 Turns:  3
  📦 Tokens: ~158.5K total

Reading the status

Field	What It Tells You
`Prefix stable`	✅ after 2+ calls with same system prompt + tools
`Hit tokens`	Cumulative tokens served from DeepSeek's disk cache
`Hit ratio`	Hit / (Hit + Miss) — target is 85–97% in a long session
`Write tokens`	Tokens written to cache for future reuse (first turn is highest)
`Truncations`	How many times Pi compacted context (doesn't affect stability)

Verification

On load, the extension logs to Pi's output:

[pi-reasonix] Loaded. Active for DeepSeek providers.
[pi-reasonix] Pillars: Cache-First Loop | Tool-Call Repair | Cost Control

Run /reasonix-status inside Pi to confirm activation and see live statistics.

Architecture

pi-reasonix/
├── extensions/
│   └── index.ts          # Pi extension entry — event wiring and state
├── src/
│   ├── cache-first.ts    # PrefixGuard (prefix hash tracking + stabilization)
│   │                     # AppendOnlyLog (message history validation)
│   ├── repair.ts         # 4-pass tool-call repair pipeline
│   │                     #   (scavenge, truncation repair, flatten, storm detection)
│   ├── cost-control.ts   # Tool-result compaction, context estimation
│   └── types.ts          # Shared interfaces and type definitions
├── test/
│   ├── core.test.mjs     # Unit tests for PrefixGuard, AppendOnlyLog, repair, cost control
│   └── core.integration.test.mjs  # Integration tests for extension wiring
├── package.json
├── tsconfig.json
└── README.md

Key design decisions

Standalone modules in src/ — the core algorithms (PrefixGuard, repair pipeline, cost control) are framework-agnostic and could power an OpenCode plugin or custom script
Extension wiring in extensions/index.ts — Pi-specific event registration, state management, and the /reasonix-status command
Async factory — the extension factory is async to allow reading Pi's settings at init time for early model detection
No runtime dependencies — the extension only imports Pi's type definitions for TypeScript safety; runtime relies on Pi's built-in event system

Building & Testing

# Install dependencies
npm install

# Compile TypeScript → dist/
npm run build

# Run test suite (27 tests: 14 unit + 13 integration)
npm test

Tests use a mock Pi API to verify extension wiring and real algorithmic tests for PrefixGuard, AppendOnlyLog, tool-call repair, and cost control. No live API keys required.

Publishing

npm login
npm publish

The prepublishOnly hook compiles TypeScript and runs the test suite before publishing.

License

MIT — see LICENSE.

Acknowledgements

This package is an AI-created adaptation of innovations from the Reasonix project.

Source

All three pillars — Cache-First Loop, Tool-Call Repair, and Cost Control — are harvested from Reasonix (MIT, by the esengine community).

Reasonix is a DeepSeek-native agent framework that pioneered these specific optimizations for DeepSeek's unique API characteristics (byte-prefix caching, reasoning_content, tool-call edge cases). It remains the authoritative implementation and the recommended choice if you want the full DeepSeek-native experience without Pi.

Translation process

pi-reasonix is a structural translation of Reasonix's core algorithms into Pi's extension architecture:

The PrefixGuard and AppendOnlyLog classes in src/cache-first.ts mirror Reasonix's immutable prefix + append-only log with adaptations for Pi's message ordering constraints
The tool-call repair pipeline in src/repair.ts follows Reasonix's 4-pass approach (scavenge, truncation repair, flatten, storm detection) with adjustments for Pi's streaming context
The cost-control logic in src/cost-control.ts adapts Reasonix's compaction thresholds to Pi's tool-result streaming
Pi-specific event wiring (extensions/index.ts) replaces Reasonix's internal provider hooks

All 27 tests in the test suite validate that the translated algorithms preserve Reasonix's original behavior and correctness.

Why not just use Reasonix directly?

Reasonix is a standalone agent framework. If you're already invested in Pi's TUI, extension ecosystem, and provider system, pi-reasonix brings Reasonix's optimizations into your existing workflow without changing tools. If you don't use Pi, you should use Reasonix directly — it's the canonical implementation.

Credit

All architectural credit goes to the Reasonix contributors for engineering DeepSeek-specific solutions that generic agent frameworks overlook. This adaptation stands on their work.

pi-reasonix

Table of Contents

Why this exists

Theory: DeepSeek Prefix Caching

The Three Pillars

Pillar 1 — Cache-First Loop

Pillar 2 — Tool-Call Repair

Pillar 3 — Cost Control

How it's wired into Pi

Model detection priority

Cache metric extraction

Installation

System requirements

Usage

Reading the status

Verification

Architecture

Key design decisions

Building & Testing

Publishing

License

Acknowledgements

Source

Translation process

Why not just use Reasonix directly?

Credit