@monotykamary/pi-cost-backoff
Cost-aware request throttling for pi — exponential backoff when $/M or $/min exceeds your cap. Companion to pi-tps.
Package details
Install @monotykamary/pi-cost-backoff from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@monotykamary/pi-cost-backoff- Package
@monotykamary/pi-cost-backoff- Version
0.1.0- Published
- Jun 19, 2026
- Downloads
- not available
- Author
- monotykamary
- License
- MIT
- Types
- extension
- Size
- 77.7 KB
- Dependencies
- 0 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./extensions"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
⏳ pi-cost-backoff
Cost-aware request throttling for pi
Exponential backoff when $/Mtok or $/min exceeds your cap. Companion to pi-tps.
pi-tps is a passive sensor: it measures TPS/cost per turn and emits a tps:telemetry event on pi's shared bus. This extension is the actuator — it consumes those signals and intentionally delays the next provider request when cost metrics exceed your thresholds, using exponential backoff with jitter.
Keeping sensor and actuator separate means pi-tps's TPS measurements stay honest — the throttle never perturbs the thing it measures, and backoff state never leaks into pi-tps's persisted telemetry.
Quick start
pi install npm:@monotykamary/pi-tps # the sensor (provides the cost signal)
pi install npm:@monotykamary/pi-cost-backoff # the actuator (this extension)
Then set a cap and run pi from that directory:
COST_CAP_USD_PER_MIN=0.50 pi
Install order: load
pi-tps(the sensor) beforepi-cost-backoffso the cost signal is available from the first turn. See Cost signal.
What's included
| Actuator | Delays the next provider request via before_provider_request when a cost cap trips |
| Triggers | $/Mtok spike, $/min burn-rate, and reactive 429 (honors retry-after) |
| Command | /cost-backoff — inspect live config + backoff state |
Features
- Exponential backoff with jitter:
delay = min(base · 2^level, max)with ±20% jitter — capped at 8 levels, decays one level perdecay-msof clean behavior - Burn-rate cap (
$/min): rolling spend velocity over a sliding window — the coherent "slow my spend down" lever (backoff directly lowers $/min) - Spike cap (
$/Mtok): per-turn unit-price circuit-breaker against runaway cost (cache miss, model swap, provider issue) - Reactive 429: honors
retry-after(delta-seconds or HTTP-date), escalates the backoff level across turn boundaries — composes with pi's built-in transport retry - Honest sensor/actuator split: pi-tps measures, pi-cost-backoff throttles — TPS numbers stay clean, telemetry stays unpolluted
- Abortable waits: backoff sleeps respect
ctx.signal, so Esc interrupts an in-progress backoff - Config via flags or env: every option has a
--flagand aCOST_*env var (flags win);--cost-backoff-disableis a master kill-switch - Live status: footer line shows active backoff;
/cost-backoffdumps full state
Install
pi install npm:@monotykamary/pi-cost-backoff
Or install from GitHub:
pi install https://github.com/monotykamary/pi-cost-backoff
cp -r extensions/pi-cost-backoff ~/.pi/agent/extensions/
Then /reload in pi.
The triggers
Any of three conditions fires the same exponential backoff on the next provider request:
| Trigger | Signal | Honest framing |
|---|---|---|
| $/Mtok spike | rateUsdPerMTokens from the prior turn exceeds --cost-cap-usd-per-m |
A per-turn unit-price anomaly. Throttling cannot lower that turn's price; it caps the velocity of subsequent expensive turns. A velocity lever applied to a price signal — not magic. |
| $/min burn | Rolling spend velocity over a sliding window exceeds --cost-cap-usd-per-min |
The coherent "cap spend via backoff" lever: slow the request stream, lower $/min. |
| 429 reactive | Provider returns 429 in after_provider_response (honors retry-after) |
Composes with pi's built-in transport retry by stashing retry-after so the next request (across a turn boundary if needed) honors it and escalates the level. |
$/Mtok vs $/min — which do I want?
--cost-cap-usd-per-minis the natural fit for "slow my spend down." Backoff directly lowers $/min.--cost-cap-usd-per-mtrips on per-turn unit-price spikes. It does not make individual tokens cheaper — backoff only limits how quickly you can rack up expensive turns. Use it as a circuit-breaker against runaway unit cost, not a price reducer.
Both can be set simultaneously; the spike trigger is evaluated first.
Configuration
All options are available as CLI flags (registered by the extension) or environment variables. Flags win over env.
| Flag | Env | Default | Description |
|---|---|---|---|
--cost-cap-usd-per-m |
COST_CAP_USD_PER_M |
disabled | Per-turn $/Mtok spike threshold |
--cost-cap-usd-per-min |
COST_CAP_USD_PER_MIN |
disabled | Rolling $/min burn-rate threshold |
--cost-backoff-base-ms |
COST_BACKOFF_BASE_MS |
1000 |
Base backoff delay (ms), doubled each level |
--cost-backoff-max-ms |
COST_BACKOFF_MAX_MS |
30000 |
Maximum backoff delay (ms) |
--cost-backoff-window-ms |
COST_BACKOFF_WINDOW_MS |
60000 |
Sliding-window length for $/min (ms) |
--cost-backoff-decay-ms |
COST_BACKOFF_DECAY_MS |
30000 |
ms of clean behavior to decay one level |
--cost-backoff-disable |
— | false |
Kill-switch: disables all triggers |
Examples:
# Cap burn at $0.50/min, env var (no flag prefix)
COST_CAP_USD_PER_MIN=0.50 pi
# Cap unit-price spikes at $5.00/Mtok via flag
pi --cost-cap-usd-per-m 5.00
# Both, with faster decay (10s) and a 60s ceiling
pi --cost-cap-usd-per-m 5.00 --cost-cap-usd-per-min 0.50 \
--cost-backoff-decay-ms 10000 --cost-backoff-max-ms 60000
Backoff strategy
- Exponential with jitter:
delay = min(base · 2^level, max), ±20% jitter. - Escalation: consecutive trips bump the level (capped at 8); delay doubles each level.
- Decay: every
decay-msof clean behavior reduces the level by one (residual clean time is preserved across a decay). - 429: honors
retry-after(delta-seconds or HTTP-date), never below the exponential floor; falls back to 5s whenretry-afteris missing.
Example progression (defaults: base 1s, max 30s, jitter zeroed for clarity):
trip 1 → level 1 → 2.0s
trip 2 → level 2 → 4.0s
trip 3 → level 3 → 8.0s
trip 4 → level 4 → 16.0s
trip 5 → level 5 → 30.0s (clamped)
...clean for 30s... → level 4
How it works
The throttle point is pi's before_provider_request hook. pi awaits this hook before sending the HTTP request, so an await sleep(N) here genuinely delays the request (verified in pi's sdk.js onPayload).
turn N turn-end cost captured → lastRateUsdPerM / burn window
└─ tps:telemetry ─────────────► pi-cost-backoff state
turn N+1
└─ before_provider_request ──► evaluate triggers → sleep(delay) if tripped → request fires
└─ after_provider_response ──► if 429: stash retry-after for the next request
Cost signal
- Primary: subscribes to the
tps:telemetryevent emitted by pi-tps, capturingrateUsdPerMTokens(spike trigger) andcost.total(burn-rate window). - Fallback: if no
tps:telemetryhas ever been seen, readsmessage.usage.cost.totaldirectly inturn_end. The fallback cannot computerateUsdPerMTokens, so the spike trigger is inactive until pi-tps is observed. - Once a single
tps:telemetryevent arrives, the fallback is permanently disabled (avoids double-counting cost in the burn-rate window when both paths fire for the same turn).
Install-order note: if pi-tps is loaded after pi-cost-backoff, the very first turn may double-count its cost in the burn-rate window (after that, telemetry owns it). Load pi-tps first to avoid this — the same graceful-degradation pattern pi-tps itself uses for its Neuralwatt cost handoff.
Burn-rate computation
$/min = (sum of costs in the sliding window) / max(elapsed-since-oldest-in-window, 1s) × 60000
The 1s floor prevents a single recent expensive turn from exploding the rate. The window is pruned as new samples arrive.
Measurement caveats
- TTFT absorbs the backoff delay.
before_provider_requestfires afterturn_start, so a throttled turn's TTFT (as reported by pi-tps) includes the intentional wait. This is arguably correct — TTFT should reflect an intentional delay. - Generation TPS stays honest. pi-tps measures generation speed
message_start→message_end, entirely after the request fires, so the backoff delay never inflates or deflates generation TPS. - No persisted control state. Backoff level, window, and rate are transient runtime state, re-derived from fresh signals on session resume (no stale lockout risk). The cost window starts empty after a reload.
Inspecting state
/cost-backoff
Shows current config and live state — armed caps, backoff level, window sample count, last seen $/Mtok, and any pending 429 override. Same output is reflected in the footer status line (pi-cost-backoff) while a backoff is active; cleared when the level decays to zero.
Testing
npm install
npm test # 73 tests
npm run test:coverage # index.ts at 100% line coverage
npm run typecheck
npm run lint:dead # knip
License
MIT