@mcowger/pi-better-messages-cache

Pi extension: dual cache-breakpoint strategy for Anthropic models — marks both the last assistant tool_use block and the last user message block with cache_control, dramatically improving cache hit rates on MiniMax, Kimi, and other Anthropic-compatible pr

Package details

← Back

extension

Install @mcowger/pi-better-messages-cache from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@mcowger/pi-better-messages-cache

Package: @mcowger/pi-better-messages-cache
Version: 1.2.0
Published: Apr 23, 2026
Downloads: 392/mo · 278/wk
Author: mcowger
License: MIT
Types: extension
Size: 62.8 KB
Dependencies: 1 dependency · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-better-messages-cache

A pi extension that implements the dual cache-breakpoint strategy for Anthropic models, dramatically improving prompt-cache hit rates on MiniMax, Kimi, and other Anthropic-compatible providers.

This implements the optimization proposed in badlogic/pi-mono#1737, which the upstream maintainer declined to merge into core.

The problem

The built-in Anthropic provider marks the last user message block with cache_control. On some providers — notably MiniMax and Kimi — the preceding assistant tool_use and thinking blocks sit outside the cached window, so the cache must be re-read from scratch on almost every turn:

turn N
  [assistant]  thinking …          ← NOT cached ✗
               tool_use foo        ← NOT cached ✗
  [user]       tool_result foo     ← cache_control ✓  (only marker)

turn N+1
  The cache window starts at tool_result, missing the assistant blocks above.

The fix

Mark two locations per turn:

Location	Who marks it
Last assistant `tool_use` block	This extension (new)
Last user message block	Built-in provider (preserved)

Both markers together ensure the full assistant turn (thinking + tool_use + tool_result) sits inside the growing cached prefix on every subsequent call:

turn N
  [assistant]  thinking …
               tool_use foo  ← cache_control ✓  (marker 1 — NEW)
  [user]       tool_result foo  ← cache_control ✓  (marker 2 — existing)

turn N+1
  The cache window now covers the entire assistant turn above.

This dual-marking pattern aligns with the cache strategies used by OpenCode, Kilo Code, and Roo Code.

Anthropic cache breakpoint limit

Anthropic-compatible APIs allow a maximum of 4 total blocks with cache_control in a single request.

That limit applies across the entire payload, including:

system prompt blocks
assistant tool_use blocks
user / tool_result blocks

In longer multi-turn conversations, a naive dual-marking strategy can accidentally exceed that limit and trigger errors like:

A maximum of 4 blocks with cache_control may be provided. Found 5.

To prevent this, this extension now enforces the limit before sending the request:

keep system prompt cache markers intact
keep the newest message-level cache breakpoints
remove older message-level cache breakpoints first

This preserves the most useful recent cache anchors while ensuring requests never exceed Anthropic's hard cap.

Empirical impact (from PR #1737 field data)

Provider	Before	After
MiniMax / Kimi	near-zero cache hits	80 %+ cache hit rate
Anthropic native	baseline	small positive improvement

Built-in pi caching — "cache hit wall" (MiniMax)

Note: Notice the "cache hit wall" at ~4.2K cache hits — the orange cache-hit line flatlines, while the cache-miss line continues climbing.

With pi-better-messages-cache extension — drastically improved cache hits

Note: Cache hits continue climbing throughout the session — the orange line no longer flatlines, achieving the dual cache-breakpoint strategy's intended behavior.

How it works

pi.registerProvider("anthropic", { api: "anthropic-messages", streamSimple }) replaces the global api-registry entry for the "anthropic-messages" API type. This transparently intercepts every model that uses that API — all native Anthropic models — without touching any model definitions, pricing, OAuth config, or other settings.

Installation

# Global install (all projects)
pi install npm:@mcowger/pi-better-messages-cache

# Project-local install
pi install -l npm:@mcowger/pi-better-messages-cache

Try without installing

pi -e npm:@mcowger/pi-better-messages-cache

From git (latest unreleased)

pi install git:github.com/mcowger/pi-better-messages-cache

Requirements

pi (any recent version)
@mariozechner/pi-coding-agent and @mariozechner/pi-ai (bundled with pi, listed as peerDependencies)

Uninstalling

pi remove npm:pi-better-messages-cache

This restores the built-in Anthropic stream handler automatically.