pi-opencode-go-cache

Brings OpenCode CLI–equivalent prompt caching to Pi for the OpenCode Go provider (kimi, deepseek, glm, mimo, qwen, minimax). Sets prompt_cache_key, 24h retention, and Anthropic-style cache_control markers on every request.

Packages

Package details

extension

Install pi-opencode-go-cache from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-opencode-go-cache
Package
pi-opencode-go-cache
Version
0.2.1
Published
Jun 19, 2026
Downloads
478/mo · 478/wk
Author
nnocte
License
MIT
Types
extension
Size
25.5 KB
Dependencies
0 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions/opencode-go-cache.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-opencode-go-cache

Brings OpenCode CLI–equivalent prompt caching to Pi for the OpenCode Go provider.

Source / issues: github.com/nnocte/pi-opencode-go-cache

How it works

The extension hooks Pi's before_provider_request event, which fires after the provider has built its API payload but before the HTTP request is sent. It mutates the payload in place:

   ┌──────────────────────────────────────────────────────────┐
   │  Pi built this payload (e.g. openai-completions body)    │
   │                                                          │
   │  { model, messages: [...], tools: [...], stream, ... }   │
   └──────────────────────────────────────────────────────────┘
                                │
                                ▼
   ┌──────────────────────────────────────────────────────────┐
   │  before_provider_request handler runs                    │
   │                                                          │
   │  1. Skip if provider !== "opencode-go"                   │
   │  2. Strip any stale cache_control from previous turns    │
   │  3. payload.prompt_cache_key       = session id          │
   │  4. payload.prompt_cache_retention  = "24h"              │
   │  5. Stamp cache_control on system + last 2 user/assistant│
   │     messages + last tool (matches OpenCode CLI's 2+2+1)  │
   │  6. Show "cache: <api>" in the TUI footer                │
   └──────────────────────────────────────────────────────────┘
                                │
                                ▼
   ┌──────────────────────────────────────────────────────────┐
   │  Outgoing HTTP request to https://opencode.ai/zen/go/v1  │
   │                                                          │
   │  • System prompt and stable prefix → cache hit           │
   │  • Cache reads are 5–120× cheaper than input             │
   └──────────────────────────────────────────────────────────┘

The cache_control markers create multiple breakpoints in the conversation, so the cache stays useful as the conversation grows:

   Turn 1                    Turn 2                    Turn 3
   ──────                    ──────                    ──────
   ┌─ system ─✦  ◀── new    ┌─ system ─✦   ◀── hit  ┌─ system ─✦     ◀── hit
   │  user              │    │  user         ◀── hit  │  user          ◀── hit
   └────────────────────┘    │  assistant ✦  ◀── new │  assistant ✦   ◀── hit
                             │  user              │   │  user           ◀── hit
                             └────────────────────┘   │  assistant ✦   ◀── new
                                                      │  user              │
                                                      └────────────────────┘

   ✦ = cache_control breakpoint
   ◀── hit = prefix matched and read from cache (cheap)
   ◀── new = freshly written (one-time, usually free on opencode-go)

Each marker tells the gateway "cache everything up to and including this point". On a long session, the system prompt and earlier turns stay cached and pay 5–120× less per token than the input price.

Problem

The OpenCode Go gateway (opencode.ai/zen/go) caches the request prefix automatically, but only with:

  • a short ~5 min TTL
  • no per-session cache key
  • no explicit cache_control breakpoints

Pi's built-in openai-completions provider never sets prompt_cache_key or prompt_cache_retention for opencode-go, and adds at most one cache_control marker — so you pay full input price on every call and lose the cache between long pauses.

What it does

Hooks before_provider_request and, for any opencode-go/* model, sets:

  • prompt_cache_key = clamped session id (so the cache is scoped per-Pi-session)
  • prompt_cache_retention: "24h" (default is ~5 min)
  • cache_control: {type:"ephemeral", ttl:"1h"} markers on the system prompt, last tool, and last 2 user/assistant messages — matching what OpenCode CLI does

Stale markers from previous turns are stripped before re-stamping, so breakpoints stay correct across the conversation. A compact cache: <api> indicator shows up in the TUI footer so you can confirm it's active.

Known limitations

The OpenCode Go gateway is expected to strip Anthropic-style cache_control markers and the prompt_cache_retention field for downstream APIs that don't speak Anthropic, but it does not currently do so for GLM (Zhipu) models. Stamping them causes those models to reject the request with Extra inputs are not permitted, field: ...cache_control.

To avoid breaking those models, the extension detects GLM model ids (substring match on glm / zhipu) and skips all cache stamping for them — the request goes out unchanged and the TUI shows cache: skipped (opencode-go/<model>) so it's obvious why caching is off. Add other affected models to UNSUPPORTED_CACHE_MODEL_PATTERNS in extensions/opencode-go-cache.ts as they're reported.

Install

Recommended (npm)

pi install npm:pi-opencode-go-cache

From GitHub

pi install git:github.com/nnocte/pi-opencode-go-cache

One-off run (no install)

pi -e npm:pi-opencode-go-cache
pi -e git:github.com/nnocte/pi-opencode-go-cache

What you save

Every model on the OpenCode Go subscription benefits. Cache-read prices are 5–120× cheaper than input:

Model API Cache ratio cacheWrite cost
deepseek-v4-pro openai-completions 120× free
deepseek-v4-flash openai-completions 50× free
mimo-v2.5-pro openai-completions 120× free
mimo-v2.5 openai-completions 50× free
qwen3.6-plus openai-completions 10× $0.625/M
qwen3.7-plus anthropic-messages 10× $0.50/M
qwen3.7-max anthropic-messages $3.125/M
kimi-k2.7-code openai-completions free
kimi-k2.6 openai-completions 5.9× free
minimax-m3 anthropic-messages free
minimax-m2.7 openai-completions free
glm-5.1 openai-completions 5.4× free
glm-5 openai-completions free

On a long coding session, the deepseek and mimo models see roughly 80–95 % off the input bill after the first call.

Why this and not PI_CACHE_RETENTION=long?

Setting PI_CACHE_RETENTION=long only does two of the three things OpenCode CLI does to get cheap cache hits on opencode-go:

PI_CACHE_RETENTION=long OpenCode CLI opencode-go-cache
prompt_cache_retention: "24h" (vs. ~5 min default)
prompt_cache_key (per-session, not opportunistic)
cache_control markers on system + last 2 messages + last tool
Works for anthropic-messages models too (qwen, minimax)
Single source of truth (no env vars, no models.json overrides)

Pi's openai-completions provider already drops cache_control markers for openai-completions when cacheControlFormat: "anthropic" is set in ~/.pi/agent/models.json — but only on the system prompt + last user message (1 breakpoint) and only for that one API. This extension does the full OpenCode CLI recipe for every opencode-go model in one place, so there's nothing else to configure.

Verification

Tested live against the real gateway. Every one of the 13 opencode-go models registered in Pi gets prompt_cache_key=set | retention=24h | cache_control markers=2–3 in the payload right before the request is sent.

Uninstall

pi remove npm:pi-opencode-go-cache
# or
pi remove git:github.com/nnocte/pi-opencode-go-cache