pi-model-staging
A pi extension that adds plan-then-implement mode with a single model ladder. Steps the model and thinking level down as the LLM grinds autonomously, snaps back to the snappy tier when control returns to the user.
Package details
Install pi-model-staging from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-model-staging- Package
pi-model-staging- Version
0.3.2- Published
- May 13, 2026
- Downloads
- 355/mo · 355/wk
- Author
- carlhannes
- License
- MIT
- Types
- extension
- Size
- 114.2 KB
- Dependencies
- 0 dependencies · 0 peers
Pi manifest JSON
{
"extensions": [
"./.pi/extensions/plan-stepdown"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-model-staging
A pi extension that adds a plan-then-implement workflow with a single configurable model ladder. The model and reasoning level step down as the agent grinds through tool calls "by itself", and snap back to the snappy/user-facing tier whenever control returns to you.
See CHANGELOG.md for release notes.
The mental model
One ladder, one counter. The principle: stepping only happens while the LLM is working autonomously inside one agent run. Hand control back to the user → reset to the top.
const LADDER: Rung[] = [
{ modelId: "gpt-5.5", thinking: "xhigh", webSearchContextSize: "high" }, // [0] snappy / user-facing
{ modelId: "gpt-5.4", thinking: "xhigh", webSearchContextSize: "high" }, // [1] first autonomous step
{ modelId: "gpt-5.4", thinking: "high", webSearchContextSize: "medium" }, // [2]
{ modelId: "gpt-5.4", thinking: "medium", webSearchContextSize: "medium" }, // [3]
{ modelId: "gpt-5.4-mini", thinking: "xhigh", webSearchContextSize: "low" }, // [4]+ (last rung repeats)
];
| Situation | Rung used |
|---|---|
| Plan mode (every LLM call while shaping the plan) | LADDER[0] |
| Auto-injected "Please start implementation." run, turn 1 | LADDER[1] |
| Same run, turn 2, 3, ... | step down |
agent_end during implementing → user gets control back |
reset to 0 |
| User follow-up prompt, turn 1 (LLM responding to user, "user-facing") | LADDER[0] |
| Same prompt, turn 2, 3, ... (autonomous tool calls) | step down |
| Failed tool / bash / npm/pnpm/yarn/bun result during implementing | bump next call to LADDER[1], then continue at LADDER[2] |
Re-entering /plan |
LADDER[0] |
So [0] covers "user is in control or shaping the plan", [1] is "first
step into autonomous work", and [2..] are progressive degradation as the
agent keeps grinding without checking back in. Important tool results can
restart that autonomous cursor from [1] so error/test interpretation gets
stronger reasoning before stepping down again.
How it actually works
The architectural problem: pi captures model and reasoning once when it
builds AgentLoopConfig (agent.ts:413)
and reuses them for every turn inside one agent run. Calling
pi.setModel() mid-loop never reaches the in-flight request.
This extension uses two mechanisms together:
pi.setModel()once per plan→implementation cycle, at/plan. Because every rung shares the configuredprovider, that single binding carries the provider, baseUrl, and API key through plan mode, the auto-injected "Please start implementation." run, and any user follow-ups in implementing mode. We deliberately don't call it again — pi persists eachsetModel()as a default insettings.json, which would bounce around per turn.before_provider_requestpayload rewriting on every LLM call — rewrites the wire payload'smodelandreasoning_effort/reasoning.effort/output_config.effort(depending on API) to whateverLADDER[stage]says. This is what enables stepping inside one agent run.
Same-provider constraint
All rungs must live on the same provider. The HTTP client is built before
before_provider_request runs (e.g.
anthropic.ts:466),
binding baseUrl + apiKey from the per-run model. Rewriting the payload
to reference a model on a different provider would still send the request
to the original endpoint with the original key — wrong destination, likely
wrong wire format too.
In practice: set the provider once (in plan-stepdown.json), then pick freely from the
models that provider exposes. If your provider is your own proxy (the
intended use case), you can use the model name to route to different
backend tiers — see models.example.json for the
proxy setup pattern.
If you need true per-turn cross-provider swaps, that requires a small
upstream patch to pi-mono (createLoopConfig → getter style). Not
included; see commit history if you want the rationale.
Requirements
- pi (any recent version supporting the extension API —
tested against pi-mono
mainas of May 2026) - Node.js 22+ (only for running the test suite — pi itself bundles its own runtime via jiti)
- A provider with one of the supported APIs:
- OpenAI Responses (full support: model + reasoning effort)
- OpenAI Completions (full support)
- Anthropic adaptive thinking (model + effort via
output_config) - Anthropic budget thinking (model only — budget is left untouched)
- Google generative AI (model only — budget is left untouched)
Install
Official install — npm package (recommended):
# Global (adds to ~/.pi/agent/settings.json)
pi install npm:pi-model-staging
# Global, pinned to a specific release
pi install npm:pi-model-staging@0.3.2
# Project-local (adds to .pi/settings.json — share with your team)
pi install -l npm:pi-model-staging@0.3.2
# Try once without persisting
pi -e npm:pi-model-staging@0.3.2
pi install reads the pi.extensions field from package.json and loads the
extension automatically. Unpinned npm installs follow the latest published
release; versioned installs are pinned and skipped by pi update.
After install, configure the extension with JSON instead of editing the source code directly — see Configuration.
Configuration files are loaded from:
~/.pi/agent/plan-stepdown.json(global/user).pi/plan-stepdown.json(project, overrides global)
If you want to hack on the extension itself, clone the repo and edit the TypeScript source. For normal per-user or per-project setup, prefer the JSON config files.
Manage install
pi list
pi update npm:pi-model-staging
pi remove npm:pi-model-staging
pi config
Switching from an older git/local install
If you previously installed plan-stepdown from a git source or local path,
remove the old source first to avoid duplicate commands:
pi list
pi remove <old-source-from-pi-list>
pi install npm:pi-model-staging@0.3.2
Verify install
Inside pi, run /help and you should see /plan, /stepdown, and
/stepdown-off. If they're missing, check pi --debug startup logs for
extension load errors.
Alternative: GitHub tag/source install
# Global (adds to ~/.pi/agent/settings.json)
pi install git:github.com/carlhannes/pi-model-staging@v0.3.2
# Project-local (adds to .pi/settings.json — share with your team)
pi install -l git:github.com/carlhannes/pi-model-staging@v0.3.2
# Try once without persisting
pi -e git:github.com/carlhannes/pi-model-staging@v0.3.2
Use this if you prefer to install from a Git tag/source instead of npm.
Alternative: from a local clone (recommended for developing the extension itself)
git clone https://github.com/carlhannes/pi-model-staging
cd pi-model-staging
# Edit the extension source in .pi/extensions/plan-stepdown/
pi # auto-discovers .pi/extensions/ when run from the project root
Alternative: symlink for global development use
git clone https://github.com/carlhannes/pi-model-staging
ln -s "$(pwd)/pi-model-staging/.pi/extensions/plan-stepdown" \
~/.pi/agent/extensions/plan-stepdown
Configuration
plan-stepdown uses its own JSON config files, merged in this order:
- built-in defaults in the extension
~/.pi/agent/plan-stepdown.json(global/user).pi/plan-stepdown.json(project)--stepdown-config /path/to/file.json(one-off override)
Project config overrides global config. Arrays such as ladder, tools.plan,
and tools.implementation are treated as replace, not deep-merge.
Example config
{
"provider": "openai",
"ladder": [
{ "modelId": "gpt-5.5", "thinking": "xhigh", "webSearchContextSize": "high" },
{ "modelId": "gpt-5.4", "thinking": "xhigh", "webSearchContextSize": "high" },
{ "modelId": "gpt-5.4", "thinking": "high", "webSearchContextSize": "medium" },
{ "modelId": "gpt-5.4", "thinking": "medium", "webSearchContextSize": "medium" },
{ "modelId": "gpt-5.4-mini", "thinking": "xhigh", "webSearchContextSize": "low" }
],
"tools": {
"plan": ["read", "bash", "grep", "find", "ls"],
"implementation": ["read", "bash", "edit", "write", "grep", "find", "ls"]
},
"reasoningBump": {
"bumpOnFailedBash": true,
"bumpOnFailedTool": true,
"bumpOnPackageManagerCommand": true,
"packageManagerCommands": ["npm", "pnpm", "yarn", "bun"]
},
"openaiPromptCache": {
"keyPrefix": "pi-model-staging:",
"retention": "24h"
},
"openaiWebSearch": {
"enabled": true,
"locationEnabled": true
}
}
Field reference
provider — string. Must match a provider known to pi
(pi --list-models shows them, including custom ones from
~/.pi/agent/models.json). All rungs use this provider.
ladder — array of rungs.
modelId— string. Must match a model ID forprovider.thinking—"minimal" | "low" | "medium" | "high" | "xhigh". Auto-clamped to model capabilities (e.g. settingxhighon a model that only supportshighwill silently drop tohigh).webSearchContextSize— optional."low" | "medium" | "high" | "off". Controls OpenAI Responses nativeweb_searchtool context size for this rung. Use"off"to disable hosted search on a specific rung.
tools.plan / tools.implementation — arrays of tool names. These
replace the built-in defaults for each phase.
reasoningBump — controls which tool results temporarily reset the
next LLM call to the stronger autonomous rung.
openaiPromptCache
keyPrefix— string prefix for the generated prompt-cache key.retention—"24h","in_memory", ornull. Usenullto omit the retention field.
openaiWebSearch
enabled— enable/disable OpenAI Responses hosted web search by default.locationEnabled— enable/disable approximate country/timezone metadata by default.
One-off override file
For CI, experiments, or temporary project-specific routing, you can point at another config file without changing your normal user/project config:
pi --stepdown-config ./ops/stepdown-ci.json
Model and provider names
Run pi --list-models to see what's available. Custom providers (Ollama,
vLLM, LM Studio, your own proxy) configured in ~/.pi/agent/models.json
work exactly the same — see pi's models docs
and custom provider docs.
A starter models.example.json is included for the
"openai-responses-compatible proxy" use case — one provider with several
GPT-5.x model IDs (including gpt-5.5:quick for routing to a priority
tier). Copy into ~/.pi/agent/models.json (or merge into your existing
file's providers map) and edit baseUrl / apiKey to match your setup.
System-prompt nudges
The plan/implementation prompts are still part of the extension source.
If you want to change those messages, edit PLAN_PROMPT and
IMPL_FIRST_PROMPT in .pi/extensions/plan-stepdown/index.ts.
Usage
Interactive
> /plan
plan-stepdown: Plan mode ON. Every LLM call uses [0] openai/gpt-5.5:xhigh
> How should I refactor the auth module?
[plan produced — every LLM call inside this run uses LADDER[0]]
[dialog appears]
Plan ready — what next?
> Start implementation
Refine — stay in plan mode
Cancel — leave plan mode
> Start implementation
[implementation phase begins, first LLM call uses LADDER[1], next [2],
next [3], ... last rung repeats. When done, status snaps back to LADDER[0]]
> also add tests for it
[user follow-up — first LLM call uses LADDER[0] (user-facing), then steps
down through LADDER[1], LADDER[2], ... again]
Headless / non-interactive (opt-in auto-approve)
If you want to skip the plan-approval dialog and automatically start implementation after the plan is produced:
pi -p --plan-auto-approve "Refactor the auth module"
Warning: --plan-auto-approve skips the human-in-the-loop approval step and may modify files.
The status line at the bottom shows the live cursor:
▶ impl [2] openai/gpt-5.4:high (3/5).
Commands
| Command | What it does |
|---|---|
/plan |
Enter plan mode, restrict to read-only tools, bind provider for the upcoming runs |
/stepdown |
Show the resolved provider/config ladder and current cursor, even when idle |
/stepdown-off |
Exit plan/implementation mode, restore full tools |
/stepdown shows the config loaded for the current Pi session. After editing
plan-stepdown.json, restart Pi or run /reload before checking it again.
State machine summary
In addition to the stage counter, the extension also supports one-shot reasoning bumps
inside implementing mode: when certain tool results arrive (e.g. failed tools, failing bash,
or npm/pnpm/yarn/bun output), the next LLM call temporarily uses LADDER[1] (or LADDER[0] if the ladder has
only one rung). After a bumped turn, the stage cursor continues at the rung after the bump (so a bump on LADDER[1] continues at LADDER[2]).
| Event | Stage transition |
|---|---|
/plan |
mode=planning, stage=0 |
| Every LLM call (planning) | uses LADDER[0] regardless of stage |
| Plan accepted | mode=implementing, stage=1 |
turn_end during implementing |
stage = min(stage+1, LADDER.length-1) |
tool_result trigger (implementing) |
queue bump for next LLM call (resets cursor) |
| Aborted turn | stage NOT advanced (so /resume picks up here) |
agent_end during implementing |
stage=0 (reset for next user prompt) |
/plan again, or /stepdown-off |
reset |
Native web search (OpenAI Responses)
When using an OpenAI Responses-compatible provider, this extension enables the
hosted web_search tool by default.
- It injects
{ "type": "web_search" }into the wire payload'stools. search_context_sizefollows the current ladder rung viarung.webSearchContextSize("high" → "medium" → "low" as the extension steps down).- Search is optional: if
tool_choiceis missing, it's set to"auto"so the model decides when to search. - Only OpenAI Responses payloads are modified; Chat Completions payloads are left unchanged.
- The legacy
web_search_previewtool is not used.
Location bias is enabled by default and sends approximate country and timezone
(no city/region). Timezone comes from Node's local Intl settings unless
overridden; country is inferred from common timezones such as
Europe/Stockholm → SE, or omitted when unknown.
Defaults come from openaiWebSearch.enabled and openaiWebSearch.locationEnabled
in plan-stepdown.json.
- Disable location metadata in config with
"openaiWebSearch": { "locationEnabled": false }. - Disable hosted search in config with
"openaiWebSearch": { "enabled": false }. - Disable location metadata via env with
PI_OPENAI_WEB_SEARCH_LOCATION=0. - Override country with
PI_OPENAI_WEB_SEARCH_COUNTRY=SE. - Override timezone with
PI_OPENAI_WEB_SEARCH_TIMEZONE=Europe/Stockholm.
Env vars win for one-off runs. Disable web search globally via env with
PI_OPENAI_WEB_SEARCH=0, or disable it per rung with webSearchContextSize: "off".
Caveat: Pi's visible cost/footer and citation rendering may not expose every hosted web-search detail. The prompt asks the model to cite important web sources explicitly in normal text.
Prompt caching (OpenAI)
OpenAI automatically caches long prompt prefixes, which can reduce latency and input token costs. Cache hits require exact prefix matches and typically only apply once prompts exceed ~1024 tokens.
This extension tries to improve cache affinity for OpenAI-compatible backends in a conservative way:
- It keeps pi/provider-provided cache fields if they already exist.
- If missing, it injects
prompt_cache_keybased on a stable hash of the local username + current working directory (cwd). - It optionally requests extended retention via
prompt_cache_retention: "24h". - Respects user opt-out: if the wire payload arrives with both
prompt_cache_keyANDprompt_cache_retentionundefined, that's pi signalling caching is disabled (e.g. you setcacheRetention: "none"in pi settings). The extension passes the payload through untouched rather than re-enabling what you turned off.
Configuration
In plan-stepdown.json:
openaiPromptCache.keyPrefix: defaults to"pi-model-staging:".openaiPromptCache.retention: defaults to"24h".- Set it to
nullif your proxy rejects the field or you want to omit it. - We intentionally do not force an explicit in-memory value because different OpenAI SDK versions historically used different spellings (
in_memoryvsin-memory).
- Set it to
Caveats
- Prompt caches are per-organization and per-model/backend. Stepping down across different model IDs (e.g.
gpt-5.5→gpt-5.4-mini) will not share KV cache. - If you send >~15 req/min for the same prefix+key, OpenAI may overflow-route and reduce cache effectiveness.
Monitoring
Check OpenAI usage fields (cached_tokens), or in pi watch session stats:
cacheReadtokens increase on cache hits (for providers that report it).
Tests
npm test
Runs unit tests via Node's built-in test runner with type stripping (no extra deps). Coverage:
- API detection: OpenAI Responses / OpenAI Completions / Anthropic adaptive / Anthropic budget / Google / unknown payloads
- Payload rewriting per API: model + reasoning swap, no input mutation, graceful degradation on unknown payloads
chooseRungmode/stage dispatch including clampingnextStageadvancement (called from both normal turn_end and post-bump paths)- OpenAI native web-search tool injection, including per-rung
search_context_size, opt-out, duplicate-tool avoidance, and Chat Completions pass-through - OpenAI prompt-cache key/retention augmentation, including the user-opt-out path
- Reasoning bump trigger detection (failed bash, failed tool calls, package-manager output)
- End-to-end lifecycles (two scenarios):
- Plain plan → accept → implement → reset → follow-up, asserting the exact sequence of model + effort values at every LLM call
- Bumped path: a
npm test-style trigger mid-run, asserting the bumped turn uses LADDER[1] and the next normal turn resumes at LADDER[2] (not the pre-bump cursor)
The pure logic lives in rewrite.ts (no pi imports), so tests run without pi or any LLM API keys.
Troubleshooting
plan-stepdown: model X/Y not found
The model ID in your ladder doesn't match what's in pi --list-models for
your configured provider. Fix the typo, or register a custom provider in
~/.pi/agent/models.json. The extension resets to idle on this error so it
won't keep firing.
plan-stepdown: no API key configured for X
Run pi auth login for that provider, or set the env var (see
pi-mono/packages/ai/src/env-api-keys.ts
for the var names per provider).
Pi shows the wrong model in its status display
We deliberately call setModel() only once at /plan. Pi's display reads
agent.state.model and so shows LADDER[0]'s model the whole time. The
actual model that hits the wire is whatever the per-call rewrite
substitutes — our own status widget shows the truth. We avoid calling
setModel() per turn because it persists the new model as the default in
settings.json, which would bounce around constantly.
Status widget doesn't appear
The widget needs an interactive TUI. In pi --print / --mode json /
--mode rpc, the widget is suppressed but the model swaps still happen.
Plan-ready dialog never appears
In non-interactive modes (pi -p, --mode json) there is no dialog UI. If you want the extension to automatically start implementation after planning, use --plan-auto-approve.
Limitations
- All rungs must be on the same provider (architectural — see "Same-provider constraint" above).
- Anthropic budget thinking and Google models swap
modelonly; the thinking budget is left alone. Use adaptive Anthropic models or set explicit budgets in your proxy if you need per-turn budget control. - The state we persist across
/resumeismode + stage. One-shot bump state is intentionally in-memory only. The state machine resumes correctly but the auto-injectedIMPL_FIRST_PROMPTfires only once per accept, not on resume. setModel()is called only at/plan(once per plan→implementation cycle), so pi's own model display lags behind the actual rung in flight. Our status widget shows the truth — see Troubleshooting.
How this extension is built
If you want to fork or learn from it:
- .pi/extensions/plan-stepdown/index.ts — event subscriptions, mode/state, command registration.
- .pi/extensions/plan-stepdown/rewrite.ts — pure functions for API detection and payload rewriting. Zero pi imports so it's testable in isolation.
- .pi/extensions/plan-stepdown/rewrite.test.ts — unit tests with realistic payload fixtures, prompt-cache coverage (including user opt-out), reasoning-bump coverage, and two end-to-end lifecycle simulations (with and without a bump).
The pi APIs used are documented at:
- pi extensions guide
- pi packages guide
- extension type definitions
before_agent_startemit sitebefore_provider_requestplumbingcreateLoopConfig— explains why per-turn swaps need payload rewriting
The existing upstream plan-mode example is a good reference for the plan/implementation UX pattern this extension extends.
Contributing
Issues and PRs welcome. The change surface is small:
- Logic changes go in
rewrite.tswith corresponding tests inrewrite.test.ts. Runnpm testbefore sending a PR. - Pi-API integration lives in
index.ts. There are no integration tests for this layer — verify by runningpilocally and watching the status widget plus your provider's request logs.
If you add support for a new API family, add a fixture and a detection
test to rewrite.test.ts. The fixture should mirror what the corresponding
provider in
pi-mono's packages/ai/src/providers/
actually sends on the wire.
License
MIT — see LICENSE.
This extension is independent of pi but builds on its public extension API. pi itself is also MIT-licensed (badlogic/pi-mono).