pi-opencode-go-cache

Brings OpenCode CLI–equivalent prompt caching to Pi for the OpenCode Go provider (kimi, deepseek, glm, mimo, qwen, minimax). Sets prompt_cache_key, 24h retention, and Anthropic-style cache_control markers on every request.

Packages

Package details

extension

Install pi-opencode-go-cache from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-opencode-go-cache

Package: pi-opencode-go-cache
Version: 0.2.1
Published: Jun 19, 2026
Downloads: 478/mo · 478/wk
Author: nnocte
License: MIT
Types: extension
Size: 25.5 KB
Dependencies: 0 dependencies · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions/opencode-go-cache.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-opencode-go-cache

Brings OpenCode CLI–equivalent prompt caching to Pi for the OpenCode Go provider.

Source / issues: github.com/nnocte/pi-opencode-go-cache

How it works

The extension hooks Pi's before_provider_request event, which fires after the provider has built its API payload but before the HTTP request is sent. It mutates the payload in place:

   ┌──────────────────────────────────────────────────────────┐
   │  Pi built this payload (e.g. openai-completions body)    │
   │                                                          │
   │  { model, messages: [...], tools: [...], stream, ... }   │
   └──────────────────────────────────────────────────────────┘
                                │
                                ▼
   ┌──────────────────────────────────────────────────────────┐
   │  before_provider_request handler runs                    │
   │                                                          │
   │  1. Skip if provider !== "opencode-go"                   │
   │  2. Strip any stale cache_control from previous turns    │
   │  3. payload.prompt_cache_key       = session id          │
   │  4. payload.prompt_cache_retention  = "24h"              │
   │  5. Stamp cache_control on system + last 2 user/assistant│
   │     messages + last tool (matches OpenCode CLI's 2+2+1)  │
   │  6. Show "cache: <api>" in the TUI footer                │
   └──────────────────────────────────────────────────────────┘
                                │
                                ▼
   ┌──────────────────────────────────────────────────────────┐
   │  Outgoing HTTP request to https://opencode.ai/zen/go/v1  │
   │                                                          │
   │  • System prompt and stable prefix → cache hit           │
   │  • Cache reads are 5–120× cheaper than input             │
   └──────────────────────────────────────────────────────────┘

The cache_control markers create multiple breakpoints in the conversation, so the cache stays useful as the conversation grows:

   Turn 1                    Turn 2                    Turn 3
   ──────                    ──────                    ──────
   ┌─ system ─✦  ◀── new    ┌─ system ─✦   ◀── hit  ┌─ system ─✦     ◀── hit
   │  user              │    │  user         ◀── hit  │  user          ◀── hit
   └────────────────────┘    │  assistant ✦  ◀── new │  assistant ✦   ◀── hit
                             │  user              │   │  user           ◀── hit
                             └────────────────────┘   │  assistant ✦   ◀── new
                                                      │  user              │
                                                      └────────────────────┘

   ✦ = cache_control breakpoint
   ◀── hit = prefix matched and read from cache (cheap)
   ◀── new = freshly written (one-time, usually free on opencode-go)

Each marker tells the gateway "cache everything up to and including this point". On a long session, the system prompt and earlier turns stay cached and pay 5–120× less per token than the input price.

Problem

The OpenCode Go gateway (opencode.ai/zen/go) caches the request prefix automatically, but only with:

a short ~5 min TTL
no per-session cache key
no explicit cache_control breakpoints

Pi's built-in openai-completions provider never sets prompt_cache_key or prompt_cache_retention for opencode-go, and adds at most one cache_control marker — so you pay full input price on every call and lose the cache between long pauses.

What it does

Hooks before_provider_request and, for any opencode-go/* model, sets:

prompt_cache_key = clamped session id (so the cache is scoped per-Pi-session)
prompt_cache_retention: "24h" (default is ~5 min)
cache_control: {type:"ephemeral", ttl:"1h"} markers on the system prompt, last tool, and last 2 user/assistant messages — matching what OpenCode CLI does

Stale markers from previous turns are stripped before re-stamping, so breakpoints stay correct across the conversation. A compact cache: <api> indicator shows up in the TUI footer so you can confirm it's active.

Known limitations

The OpenCode Go gateway is expected to strip Anthropic-style cache_control markers and the prompt_cache_retention field for downstream APIs that don't speak Anthropic, but it does not currently do so for GLM (Zhipu) models. Stamping them causes those models to reject the request with Extra inputs are not permitted, field: ...cache_control.

To avoid breaking those models, the extension detects GLM model ids (substring match on glm / zhipu) and skips all cache stamping for them — the request goes out unchanged and the TUI shows cache: skipped (opencode-go/<model>) so it's obvious why caching is off. Add other affected models to UNSUPPORTED_CACHE_MODEL_PATTERNS in extensions/opencode-go-cache.ts as they're reported.

Install

Recommended (npm)

pi install npm:pi-opencode-go-cache

From GitHub

pi install git:github.com/nnocte/pi-opencode-go-cache

One-off run (no install)

pi -e npm:pi-opencode-go-cache
pi -e git:github.com/nnocte/pi-opencode-go-cache

What you save

Every model on the OpenCode Go subscription benefits. Cache-read prices are 5–120× cheaper than input:

Model	API	Cache ratio	cacheWrite cost
deepseek-v4-pro	openai-completions	120×	free
deepseek-v4-flash	openai-completions	50×	free
mimo-v2.5-pro	openai-completions	120×	free
mimo-v2.5	openai-completions	50×	free
qwen3.6-plus	openai-completions	10×	$0.625/M
qwen3.7-plus	anthropic-messages	10×	$0.50/M
qwen3.7-max	anthropic-messages	5×	$3.125/M
kimi-k2.7-code	openai-completions	5×	free
kimi-k2.6	openai-completions	5.9×	free
minimax-m3	anthropic-messages	5×	free
minimax-m2.7	openai-completions	5×	free
glm-5.1	openai-completions	5.4×	free
glm-5	openai-completions	5×	free

On a long coding session, the deepseek and mimo models see roughly 80–95 % off the input bill after the first call.

Why this and not `PI_CACHE_RETENTION=long`?

Setting PI_CACHE_RETENTION=long only does two of the three things OpenCode CLI does to get cheap cache hits on opencode-go:

	`PI_CACHE_RETENTION=long`	OpenCode CLI	opencode-go-cache
`prompt_cache_retention: "24h"` (vs. ~5 min default)	✅	✅	✅
`prompt_cache_key` (per-session, not opportunistic)	✅	✅	✅
`cache_control` markers on system + last 2 messages + last tool	❌	✅	✅
Works for `anthropic-messages` models too (qwen, minimax)	❌	✅	✅
Single source of truth (no env vars, no `models.json` overrides)	❌	✅	✅

Pi's openai-completions provider already drops cache_control markers for openai-completions when cacheControlFormat: "anthropic" is set in ~/.pi/agent/models.json — but only on the system prompt + last user message (1 breakpoint) and only for that one API. This extension does the full OpenCode CLI recipe for every opencode-go model in one place, so there's nothing else to configure.

Verification

Tested live against the real gateway. Every one of the 13 opencode-go models registered in Pi gets prompt_cache_key=set | retention=24h | cache_control markers=2–3 in the payload right before the request is sent.

Uninstall

pi remove npm:pi-opencode-go-cache
# or
pi remove git:github.com/nnocte/pi-opencode-go-cache

pi-opencode-go-cache

How it works

Problem

What it does

Known limitations

Install

Recommended (npm)

From GitHub

One-off run (no install)

What you save

Why this and not PI_CACHE_RETENTION=long?

Verification

Uninstall

Why this and not `PI_CACHE_RETENTION=long`?