pi-context-pruning
OpenCode-style proactive tool output pruning for pi — reduce token usage by pruning stale tool outputs before each LLM call
Package details
Install pi-context-pruning from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-context-pruning- Package
pi-context-pruning- Version
1.1.0- Published
- Apr 19, 2026
- Downloads
- 458/mo · 18/wk
- Author
- leftwinglautus
- License
- MIT
- Types
- extension
- Size
- 23.9 KB
- Dependencies
- 0 dependencies · 1 peer
Pi manifest JSON
{
"extensions": [
"./extensions"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-context-pruning
A pi extension that proactively prunes old tool outputs from LLM context to reduce token usage.
Pruning algorithm ported from OpenCode.
The Problem
Pi sends all tool outputs (file reads, bash output, grep results, etc.) to the LLM until the context window fills up and compaction triggers. This means:
- Long sessions accumulate massive context from stale tool outputs
- Token usage grows linearly until forced compaction
- You pay for tokens the LLM doesn't need (old file contents, superseded grep results)
OpenCode solves this by proactively pruning old tool outputs after every turn, keeping context lean. This extension brings that same strategy to pi.
Install
# From local clone
pi install /path/to/pi-context-pruning
# Or from the repo directory
pi install .
After installing, /reload or restart pi.
Enable / Disable
Enabled by default. Toggle via settings.json (global or project):
// ~/.pi/agent/settings.json (global) or .pi/settings.json (project)
{
"contextPruning": {
"enabled": false
}
}
Project settings override global. Changes take effect on /reload or next session.
How It Works
Before pruning (what pi normally sends):
┌────────┬──────┬───────┬──────┬───────┬──────┬───────┬──────┬───────┐
│ system │ user │ asst │ tool │ user │ asst │ tool │ asst │ tool │
│ prompt │ #1 │ #1 │ 50KB │ #2 │ #2 │ 30KB │ #3 │ 10KB │
└────────┴──────┴───────┴──────┴───────┴──────┴───────┴──────┴───────┘
↑ stale, expensive
After pruning (what the LLM actually sees):
┌────────┬──────┬───────┬──────────────────┬──────┬───────┬──────┬───────┐
│ system │ user │ asst │ [pruned ~12.5K │ user │ asst │ tool │ tool │
│ prompt │ #1 │ #1 │ tokens | read] │ #2 │ #2 │ 30KB │ 10KB │
└────────┴──────┴───────┴──────────────────┴──────┴───────┴──────┴───────┘
↑ tiny marker recent context preserved ↑
Algorithm (ported from OpenCode's compaction.ts)
Before each LLM call, via pi's context event:
- Walk messages backward from newest
- Skip recent turns — last 2 user turns are fully protected
- Stop at compaction boundary — already-summarized content is untouched
- Accumulate tool output tokens — first 40K tokens of older tool outputs are protected
- Beyond 40K → replace tool output content with a short marker:
[output pruned — ~12,500 tokens | read path="src/components/App.tsx"] - Only prune if worthwhile — minimum 20K tokens must be prunable
Key Properties
- Non-destructive: Session file keeps full history. Only the LLM sees pruned content.
- Preserves tool call metadata: The LLM still knows which tools were called and with what arguments.
- Complements compaction: Runs alongside pi's built-in compaction — pruning reduces token usage between compactions.
- Error outputs protected: Tool results with
isError: trueare never pruned (diagnostics matter). - Re-readable: If the LLM needs old file contents, it can re-read the file. The marker tells it what was there.
Commands
| Command | Description |
|---|---|
/prune |
Force prune now — bypasses minimum threshold, runs on next LLM call |
/prune-toggle |
Toggle pruning on/off for the current session |
/prune-stats |
Show pruning statistics for the current session |
/prune-config |
Show current pruning configuration |
Status Bar
The footer shows live pruning status:
🔪 45.2K tool tokens scanned | pruned ~25.0K | 8 protected
Configuration
Edit extensions/context-pruning/config.ts in the installed package:
| Constant | Default | Description |
|---|---|---|
PRUNE_MINIMUM |
20,000 |
Minimum prunable tokens before acting |
PRUNE_PROTECT |
40,000 |
Token budget for protected older tool outputs |
PROTECTED_TURNS |
2 |
Recent user turns to never prune |
PROTECTED_TOOLS |
[] |
Tool names that are never pruned |
PRUNABLE_TOOLS |
["read", "bash", "grep", "find", "ls", "edit", "write"] |
Tools eligible for pruning |
Tuning Guide
- More aggressive pruning: Lower
PRUNE_PROTECT(e.g.,20_000) and/orPRUNE_MINIMUM(e.g.,10_000) - Less aggressive: Raise
PRUNE_PROTECT(e.g.,80_000) or increasePROTECTED_TURNS - Protect extension tools: Add tool names to
PROTECTED_TOOLS - Prune everything: Set
PRUNABLE_TOOLSto[](empty = all non-protected tools are prunable)
How This Differs From Pi's Built-in Compaction
| Feature | Pi Compaction | Context Pruning |
|---|---|---|
| When | Context exceeds threshold | Every LLM call |
| What | Summarizes old messages via LLM | Replaces old tool outputs with markers |
| Cost | Requires LLM call for summary | Zero — no LLM calls |
| Persistence | Modifies session (adds CompactionEntry) | Non-destructive (session unchanged) |
| Granularity | Entire conversation turns | Individual tool outputs |
They work together: pruning keeps context lean between compactions, so compaction triggers less often (or not at all for shorter sessions).
Architecture
extensions/context-pruning/
├── index.ts # Extension entry — context hook, commands, status
├── pruner.ts # Pure pruning function (testable, no side effects)
└── config.ts # Configuration constants, types, and settings loader
No dependencies — only uses estimateTokens from @mariozechner/pi-coding-agent (available at runtime via pi).
Credits
Pruning algorithm ported from OpenCode. Thanks to the OpenCode team.
See also: opencode-dynamic-context-pruning