pi-slipstream-compact
Slipstream-style validated compaction extension for Pi Coding Agent
Package details
Install pi-slipstream-compact from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-slipstream-compact- Package
pi-slipstream-compact- Version
0.1.4- Published
- Jun 13, 2026
- Downloads
- 369/mo · 106/wk
- Author
- orestesk
- License
- MIT
- Types
- extension
- Size
- 310.7 KB
- Dependencies
- 0 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./src/index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-slipstream-compact
Safer compaction for long Pi Coding Agent sessions.
Long Pi sessions can lose important details when the context gets compacted: files you changed, commands that failed, decisions you made, blockers, and what should happen next. This package makes Pi check the summary before using it.
Inspired by the Slipstream research paper, adapted for real Pi coding sessions.
Status: experimental. Background preparation is enabled by default. Safe evaluation commands are available before relying on it for important sessions.
Why use this package
Install it, then keep using Pi normally.
- As the session grows, Slipstream can prepare a summary in the background.
- When you run
/compact, Pi uses Slipstream's reviewed summary instead of native compaction. - Automatic compaction uses the same review step.
- If the summary misses important state, Slipstream tries to repair it before compaction.
- Recovery artifacts are saved under
.scratch/compactionsin case you need to inspect what happened.
Install
pi install npm:pi-slipstream-compact
Or install from GitHub:
pi install git:github.com/OrestesK/pi-slipstream-compact@v0.1.0
Pi packages run with full local permissions. Review source before installing packages from npm, git, or another machine.
This package can start background model calls near context limits and can send session evidence, tool output, file paths, and git excerpts to your configured model provider. If you want to inspect first, use Evaluate safely or set autoTrigger: false before relying on it.
How to use it
After installing, keep using Pi normally.
| You want | Do this |
|---|---|
| Let compaction happen automatically | Nothing. Background preparation is enabled by default and starts when the session gets large. |
| Compact manually | Run /compact. |
| See Slipstream state | Watch the compact widget above the prompt while Slipstream is active, or run /slipstream status for details. |
| Find recovery artifacts | Run /slipstream artifacts. |
| Turn off background preparation while testing | Set autoTrigger: false; /compact still uses Slipstream while the package is enabled. |
Keep another extension or native Pi owning /compact |
Set replaceDefaultCompact: false; use /slipstream compact only when you explicitly want Slipstream. |
Before using it on a real repository, make sure .scratch/ is gitignored. Slipstream writes local recovery artifacts under .scratch/compactions.
Default behavior and settings
Default config is intentionally small:
{
"pi-slipstream-compact": {
"enabled": true,
"autoTrigger": true,
"artifactRoot": ".scratch/compactions"
}
}
Important defaults:
| Setting | Default | Meaning |
|---|---|---|
enabled |
true |
Enables background preparation and /compact replacement. Support commands remain available. |
autoTrigger |
true |
Starts preparing a checked summary in the background when the session gets large. |
replaceDefaultCompact |
true |
Makes plain /compact use Slipstream by default; set false for side-by-side mode. |
triggerContextPercent |
0.6 |
Starts/latches auto compaction around 60% context usage. |
judgeThreshold |
7 |
Minimum continuation-quality score before normal acceptance. |
repairAttempts |
3 |
Tries full-summary repair after judge rejection. |
rejectedSummaryMode |
"ask" |
Shows an interactive decision when possible; accepts on timeout/no UI unless explicitly rejected. |
artifactRoot |
.scratch/compactions |
Local recovery artifact directory inside the current project. |
statsFullPaths |
false |
Central stats redact paths by default; set true only for explicit local debugging. |
summaryModel |
active model | Uses your active Pi model unless overridden. |
judgeModel |
active model | Uses your active Pi model unless overridden. |
See Full configuration for all settings and model overrides.
Evaluate safely
You do not need these commands for normal use. They are for checking a new install before relying on it.
A safe evaluation ladder:
Inspect the prompt and local evidence without judging or compacting:
/slipstream compact --dry-runInspect
candidate-prompt.md,state-evidence.json, and git artifacts for stale state, missing blockers, or sensitive data.Inspect a judged summary before applying it:
/slipstream compact --prepareThis writes
candidate-summary.mdandjudge.json.Apply the prepared summary if it is still fresh:
/slipstream compact --adopt
Prepared summaries expire after pendingTtlMs (default: 5 minutes) and are rejected if the session branch advances too far. Old candidate-summary.md and judge.json files are still useful for inspection, but they are not enough for /slipstream compact --adopt; rerun --prepare if the pending summary expired.
Native compact vs Slipstream
| Area | Native /compact |
pi-slipstream-compact |
|---|---|---|
| Main path | One summarization pass. | Generate, validate, repair if needed, then adopt. |
| Current-state fidelity | Can lose exact latest files, errors, or decisions. | Prepends deterministic current-state facts and validates them before adoption. |
| Stale-state protection | Can preserve obsolete next actions or miss the latest turn. | Checks latest exchange, continuation evidence, and branch freshness. |
| Recovery | Summary text is usually the main surviving artifact. | Writes local snapshots, state evidence, git evidence, prompts, judge results, and adoption metadata. |
| Best use case | Shorter or low-stakes sessions where speed/cost matter most. | Long coding sessions where losing exact state is expensive. |
Local validation so far shows the difference this package is trying to optimize for: the latest fresh-agent continuation validation scored the Slipstream path at 9.36/10 versus native /compact at 5.36/10 on 11 clean overlapping cases. See Benchmark results for data and caveats.
Tradeoffs
| Tradeoff | What it means |
|---|---|
| More model calls | A validated compaction normally uses at least a summary call and a judge call; rejected candidates can add repair calls. Background preparation can spend these calls before you manually ask to compact. |
| More latency | Manual compaction is slower than native summarization because it gathers evidence, writes artifacts, judges, and may repair. Background preparation can hide some latency by starting earlier. |
| More provider-bound context | Summary, judge, and repair prompts can include conversation text, tool output, paths, git excerpts, and artifact references. Do not use it on repositories where that provider exposure is unacceptable. |
| Local artifact footprint | Recovery artifacts are written under .scratch/compactions; they may contain sensitive paths, diffs, commands, or outputs and must stay gitignored. |
| Experimental behavior | The judge improves continuation readiness but does not prove code correctness or external task success. Start with Evaluate safely on new repositories. |
If you mostly run short sessions, care more about minimizing cost than preserving exact state, or cannot send session evidence to the configured model provider, native compaction is probably the better default; disable the package to use native compaction.
What this is based on
The core idea comes from Slipstream-style validation: write a shorter summary while the session continues, then check whether that summary still supports the next continuation. This package adapts that idea for Pi coding sessions by adding file/error/decision tracking, git/session artifacts, stale-state checks, and explicit rejected-summary policy.
It also borrows narrower ideas from active context compression, subgoal-style task state, and external evidence stores: compact before the window is exhausted, preserve completed-work/current-state structure, and keep raw recovery evidence outside the prose summary. The detailed source mapping is in Research and related ideas.
Failure modes this targets
Long coding-agent sessions fail in boring, expensive ways after compaction:
- exact file paths collapse into ambiguous basenames,
- recent test failures or tool errors disappear,
- user decisions and constraints get paraphrased into something weaker,
- the agent forgets what was modified, verified, or still blocked,
- a summary sounds plausible but cannot support the next few turns of work.
Research and related ideas
The direct inspiration is Slipstream: generate a compacted handoff, then validate it against continuation evidence before adopting it.
The other references are related ideas, not full implementations in this package.
| Source | Relationship to this package |
|---|---|
| Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents and repo | Core basis: checked compaction before adoption. This package adapts it for Pi's files, tool output, git state, errors, and user decisions. |
| Active Context Compression | Related idea: manage context before the window is exhausted instead of waiting until compaction is urgent. |
| HiAgent | Related idea: long tasks need durable task/subgoal state, not just a flat transcript summary. |
| Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory | Related idea: keep recoverable evidence outside the prose summary. This package does not implement indexed memory or retrieval learning. |
| DeepAgents | Related implementation pattern: agents can keep external files/context outside the main prompt. This package uses local artifacts instead. |
| ACON | Future direction: learn better compression policies from failed compacted-vs-full continuations. |
How it works
Pi session grows
│
▼
/compact, automatic threshold compaction, or /slipstream compact
│
├─ freeze the old state
│ files, errors, decisions, constraints, latest exchange,
│ changed paths, verification evidence, and critical literals
│
├─ collect recovery evidence
│ local artifacts, git status, and git diff evidence
│
├─ generate a compacted summary
│ focused on current state and next steps
│
├─ validate the summary
│ compare it with the task state, recent continuation, and artifacts
│
├─ repair if needed
│ ask the model to rewrite missing or weak parts
│
├─ check freshness
│ reject or revalidate if newer messages appeared after validation
│
└─ compact with the accepted summary
accepted summaries are scored; rejected summaries are explicit
The important difference from a normal summarizer: adoption is validated and scored. Accepted summaries are marked by the judge; rejected summaries are handled by an explicit rejected-summary policy instead of silently falling back to native compaction. In reject mode, rejected summaries cancel compaction. In ask mode, interactive direct/default compaction shows a dialog with score, diagnosis, missing facts, contradictions, artifact directory, and summary preview; any no-UI/false/timeout result falls back to policy acceptance instead of cancellation. In accept mode, rejected summaries are accepted directly with rejectedSummaryAccepted: true, score, judge diagnostics, and artifact links in compaction details.
Features
- Plain
/compactand automatic threshold compaction use validated Slipstream compaction by default. /slipstreamsupport commands for status, artifact inspection, dry-run, and prepare/adopt evaluation.- Automatic trigger, enabled by default and configurable with
autoTrigger. - Slipstream-style continuation validation before adoption.
- Deterministic manifest extraction for files, errors, decisions, constraints, open loops, verification evidence, latest compacted updates, retained-tail current-state anchors, latest user/assistant exchange state, conservative stale/superseded signals, and critical literals.
- Local artifact store under
.scratch/compactions, with cooperative chunked trigger snapshot writes so large raw recovery artifacts do not require one giant foreground JSON serialization. - Central per-session performance stats under
~/.config/pi/.scratch/slipstream-stats/sessions/<session-id>.jsonl: mode, outcome, timing buckets, judge score, tokens before compaction, and redacted/relative artifact path by default. - Full compaction-time git diff preservation as chunked artifacts when below artifact byte caps, while keeping model-visible diff text bounded.
- Explicit rejection path: rejected summaries are accepted by policy with score, judge diagnostics, artifacts, and
rejectedSummaryAccepted: true;askmode shows a scored confirmation dialog first when UI is available, and expert--preparesummaries are recoverable frompending.jsonif runtime state resets before--adopt. - Adoption-time freshness guard: pending summaries store
validatedThroughEntryId; expert--adoptand auto activation revalidate against the current branch if newer messages appeared after preparation, while default/compactignores stale pending state and generates a fresh summary instead of adopting it. Consumed pending artifacts are cleared so an oldpending.jsoncannot be replayed after compaction. - Bounded
Session Findingssummary section for durable source-grounded facts that are useful later but are not the immediate next action. - TypeScript package manifest for Pi extension loading.
Local development install
From this repository:
cd packages/pi-slipstream-compact
pi -e .
Or add the local package path to Pi settings:
{
"packages": ["/absolute/path/to/pi-slipstream-compact"]
}
You can also persist the local package with Pi's installer:
pi install /absolute/path/to/pi-slipstream-compact
Support commands
/slipstream status
/slipstream artifacts
/slipstream compact
/slipstream compact --dry-run
/slipstream compact --prepare
/slipstream compact --adopt
| Command | Effect |
|---|---|
/slipstream status |
Shows idle/running/pending/failed state and pending judge details. |
/slipstream artifacts |
Shows the latest artifact directory remembered by the current session, if any. If it shows nothing after a restart, browse .scratch/compactions/ directly. |
/slipstream compact |
Generates, reviews, and immediately queues Slipstream compaction. |
/slipstream compact --dry-run |
Writes artifacts and a candidate prompt without changing compaction state. |
/slipstream compact --prepare |
Expert mode: generates, judges, possibly repairs, and stores a validated pending summary without applying it. |
/slipstream compact --adopt |
Expert mode: calls Pi compaction only if a validated pending summary exists; revalidates first if the branch advanced since preparation. |
Unknown flags or positional arguments on compact are rejected instead of being ignored.
Local rollout:
- Plain Pi
/compactand threshold compaction now use Slipstream by default throughsession_before_compact. /slipstream compactremains the explicit one-command support path.--dry-runwrites prompts and artifacts without changing session state.--prepareand--adoptremain expert inspect-before-apply workflows; rerun--prepareif the pending summary expires or becomes stale.- Rejected Slipstream summaries are policy-accepted with score and artifacts instead of falling back to native compaction;
askmode shows a scored confirmation dialog when UI is available, accepts on timeout/no response, and rejects only when the user explicitly selects Reject.
Full configuration
Configure in ~/.pi/agent/settings.json or project .pi/settings.json. The canonical settings key is "pi-slipstream-compact"; the older "slipstreamCompact" key is also accepted for compatibility.
Default-style configuration:
{
"pi-slipstream-compact": {
"enabled": true,
"autoTrigger": true,
"artifactRoot": ".scratch/compactions"
}
}
Disable background preparation:
{
"pi-slipstream-compact": {
"autoTrigger": false
}
}
This disables background preparation only. Plain /compact still uses Slipstream while the package is enabled.
Run side-by-side with native Pi or another extension owning plain /compact:
{
"pi-slipstream-compact": {
"replaceDefaultCompact": false
}
}
This also disables Slipstream auto-triggering. Explicit /slipstream compact, /slipstream compact --prepare, and /slipstream compact --adopt remain available. If you set replaceDefaultCompact: false, autoTrigger is normalized to false even when configured as true.
Disable Slipstream compaction replacement entirely:
{
"pi-slipstream-compact": {
"enabled": false
}
}
Support commands remain registered, but lifecycle hooks and /compact replacement are disabled.
Tuned local configuration example:
{
"pi-slipstream-compact": {
"enabled": true,
"autoTrigger": true,
"triggerContextPercent": 0.6,
"minContinuationTurns": 1,
"maxContinuationTurns": 4,
"judgeThreshold": 7,
"repairAttempts": 3,
"rejectedSummaryMode": "ask",
"pendingTtlMs": 300000,
"artifactRoot": ".scratch/compactions",
"statsFullPaths": false
}
}
| Setting | Default | Meaning |
|---|---|---|
enabled |
true |
Enables lifecycle hooks and /compact replacement. /slipstream support commands remain registered even when disabled. |
autoTrigger |
true |
Starts background summary preparation near context pressure. Set false to disable background preparation. Forced off when replaceDefaultCompact is false, including when explicitly configured as true. |
replaceDefaultCompact |
true |
When true, plain Pi /compact and threshold compaction use Slipstream. When false, plain/default compaction is left to Pi or another extension, while explicit /slipstream compact and --adopt still use Slipstream. |
triggerContextPercent |
0.6 |
Single context-pressure threshold for starting background preparation and latching compaction urgency. Fresh validated summaries compact when Pi reports the session is idle; stale summaries are revalidated before adoption. Legacy softContextPercent/hardContextPercent are accepted as aliases but should not be used for new config. |
minContinuationTurns |
1 |
Preferred continuation turns before turn-boundary auto validation. If Pi is already idle after the background summary resolves, auto finalization may proceed with fewer turns instead of waiting forever. |
maxContinuationTurns |
4 |
Maximum continuation turns collected for auto validation when later turns arrive. |
judgeThreshold |
7 |
Minimum accepted strict continuation-quality judge score. The judge prompt itself rejects safe-but-weak summaries for repair unless they are production-ready durable handoffs. |
repairAttempts |
3 |
Summary-model full-rewrite repair attempts after strict judge rejection. Empty/heading-only repair outputs are skipped without replacing the prior substantive candidate, and remaining attempts continue. |
rejectedSummaryMode |
"ask" |
Rejected-summary handling after repairs fail: "ask" shows score/diagnostics/summary preview when UI selection is available and accepts on timeout/no response unless the user explicitly rejects, "reject" cancels compaction, and "accept" accepts immediately. |
pendingTtlMs |
300000 |
Expiry for a prepared pending summary. |
artifactRoot |
.scratch/compactions |
Local artifact directory, resolved against Pi's current project cwd; paths outside the project are rejected, including existing symlinks that resolve outside the project. |
statsFullPaths |
false |
Central performance stats store cwd: "." and relative/redacted artifact paths by default. Set true only when you explicitly want full local paths in ~/.config/pi/.scratch/slipstream-stats. |
summaryModel |
active model | Optional provider/model-id override for summary generation. |
judgeModel |
active model | Optional provider/model-id override for judging. |
The judge uses a strict continuation-probe rubric for current state, next actions, constraints, risk and verification awareness, artifact grounding, retrievability, knowledge continuity, stale-state suppression, and low-noise/non-contradiction. Safe-but-weak summaries are rejected for repair instead of passing through a second critic.
Optional model override example:
{
"pi-slipstream-compact": {
"summaryModel": "openai/gpt-4.1",
"judgeModel": "openai/gpt-4.1"
}
}
Pi lifecycle integration
The extension registers one command namespace and a small set of lifecycle handlers:
| Pi surface | Package behavior |
|---|---|
registerCommand("slipstream", ...) |
Provides /slipstream status, artifacts, one-command compact, compact --dry-run, compact --prepare, and compact --adopt. |
turn_end |
At triggerContextPercent, starts candidate generation after final assistant responses, latches compaction urgency, collects continuation evidence when later turns arrive, revalidates stale pending summaries against the live branch head, and activates only fresh pending summaries with ctx.compact(). |
session_before_compact |
Uses a current Slipstream-requested pending summary if available; otherwise generates, judges, and returns a fresh Slipstream summary as the default compaction replacement. With replaceDefaultCompact: false, plain/default compaction returns to Pi or another extension unless /slipstream compact explicitly requested adoption. |
session_start / session_shutdown |
Keeps the compact widget hidden while idle, clears it on shutdown, updates lightweight status, and clears in-memory background state. |
The normal manual path is plain Pi /compact or /slipstream compact when replaceDefaultCompact is enabled, and explicit /slipstream compact only when side-by-side mode is enabled. The automatic path uses one threshold: begin preparing at triggerContextPercent, keep collecting continuation evidence when it arrives, and compact only when the pending summary is validated through the current branch head and Pi reports an idle runtime with no queued messages. If no later turn arrives after the background summary resolves, Slipstream can still proceed through finalizing, judging, repair, pending-summary creation, and idle adoption instead of waiting forever. If the auto pending summary is stale, Slipstream revalidates it with a fresh retained-tail boundary instead of blocking on old work, adopting stale state, or keeping an old oversized tail. If timing still reaches Pi's own model-limit compaction, Slipstream generates/judges directly in session_before_compact unless replaceDefaultCompact is disabled. The expert prepare/adopt split still exists when you want to inspect a validated pending summary before applying it.
Integration API
Other extensions can reuse Slipstream-style validation without installing Slipstream as the default /compact owner:
import { slipstreamStyleValidateAndRepair } from "pi-slipstream-compact/integration-api";
const result = await slipstreamStyleValidateAndRepair({
candidate,
sourceEvidence: {
sourceMessageExcerpts,
filesModified,
unresolvedErrors,
userDecisions,
constraints,
},
continuation,
completeText,
config: { judgeThreshold: 7, repairAttempts: 1 },
});
The integration API is deliberately narrower than the full extension lifecycle. It judges and optionally repairs a caller-provided candidate summary using caller-provided evidence and a caller-provided completeText function. It does not call ctx.compact(), register commands, manage pending state, update widgets, write local artifacts, or write central stats. The result includes the final summary, accepted, repaired, repairCount, top-level score/diagnostics, and the underlying JudgeResult.
Use this when another extension already owns candidate generation and wants a Slipstream-style quality gate. Use the full package lifecycle when you want Slipstream to own candidate generation, evidence collection, repair, freshness checks, artifacts, and adoption.
Progress visibility
The package shows a compact Slipstream widget above the prompt only while Slipstream is actively preparing, compacting, or holding a prepared pending summary. It is hidden while idle. The widget stays short: exact current stage plus elapsed time for active work, and judge score when available. In the interactive TUI, it uses Pi theme colors; in plain/RPC contexts, it falls back to text.
Widget contents are intentionally limited to actionable stage labels:
| Internal state | Widget text example | Why this is enough |
|---|---|---|
| snapshot | Slipstream: snapshotting local state · 3s |
Shows local synchronous work separately from model calls. |
| artifacts | Slipstream: writing artifacts · 1s |
Shows local artifact writes. |
| state evidence | Slipstream: collecting evidence · 2s |
Shows bounded read-only git/session evidence collection. |
| summary | Slipstream: summarizing · 38s |
Shows the summary model call is running. |
| waiting for auto summary | Slipstream: waiting for auto summary · 12s |
Auto finalization is waiting for the background summary before judging. |
| judging | Slipstream: checking summary · 21s |
Shows the judge model call is running. |
| repairing | Slipstream: repairing summary · 74s · last score 4/10 |
The score explains why repair is running without implying the repair output has been judged. |
| accepted/current pending | Slipstream: ready · score 9/10 |
The score is current and actionable. |
| rejected | Slipstream: summary rejected · score 4/10 |
The low score is the concise reason this is rejected; detailed diagnosis stays in warning/status text. |
| applying compaction | Slipstream: compacting · score 9/10 |
Pi is applying an already judged prepared summary. |
| idle | hidden | No active action. |
If Pi is closed while a prepared pending summary exists, startup recovery checks the persisted pending.json. The widget is restored as Slipstream: ready · score N/10 only when the recovered summary still matches the same session, cwd, TTL, and current branch head; stale or expired pending summaries stay hidden.
Routine progress stays visible without chat-style progress spam: the widget shows active stage and elapsed time, while the footer/status line changes only when the progress phase or message changes so timer ticks do not repaint both UI regions every second. Active lifecycle progress owns a disposable widget controller, so shutdown and compaction teardown can cancel timers instead of leaving stale owners behind. Lifecycle progress can preempt older lifecycle progress; support-command progress does not preempt an active lifecycle owner. Interactive rejected-summary decisions use the confirmation/select UI when rejectedSummaryMode is "ask"; non-interactive accepted/rejected outcomes are recorded in compaction details or a concise warning.
Current compaction mode
The package now supports one runtime strategy across manual, hook, and auto paths: continuation-first Slipstream replacement. The former fact-ledger route was removed because it did not earn its extra model calls, lifecycle state, prompts, tests, and user-facing complexity.
The former --high-accuracy mode was removed because it added broad chunk evidence without a global prompt budget and could overload the summary model.
A future slower path should be targeted refinement: start from the normal candidate, use the judge to identify missing/risky facts, then retrieve only the evidence needed for those gaps.
Artifact model
Artifacts are local recovery evidence, not decorative logs. A typical validated run directory contains:
.scratch/compactions/<session-id>-<run-id>/
run.json
index.json
trigger-snapshot.json
trigger-raw-001.json
state-evidence.json
git-status.txt
git-diff-stat.txt
git-diff-full-001.patch
git-snapshot.json
candidate-summary.md
judge.json
continuation.json
adoption.json
Dry-run directories write candidate-prompt.md instead of candidate-summary.md and do not write judge.json, continuation.json, or adoption.json.
git-snapshot.json records:
- status path,
- diff-stat path,
- full diff chunk paths,
- full diff SHA-256,
- full diff byte count,
- whether git diff collection completed,
- whether full preservation succeeded.
By default, artifact chunks are 512 KiB and a single artifact payload is capped at 96 MiB. Trigger snapshot chunks are written cooperatively from JSON fragments instead of first building one full raw JSON string and buffer, so large local recovery snapshots should yield back to Pi's event loop during artifact preparation. They can still consume disk and CPU proportional to transcript size; the goal is responsiveness, not zero local cost.
If the full git diff is larger than the cap, or if git diff collection reports truncation or another error, the package writes an omission/incomplete note instead of pretending the stored bytes are a complete diff. Summary prompts are capped by reducing model-visible conversation text; if protected fixed sections alone exceed the prompt cap, Slipstream fails fast with artifacts instead of sending an oversized prompt.
Evidence semantics
The package separates evidence into three levels:
| Level | Examples | Used for |
|---|---|---|
| Model-visible writer grounding | bounded git diff, manifest facts, artifact paths | helps the summary writer produce a better candidate |
| Judge-protected facts | files, errors, constraints, decisions, latest updates, critical literals, verification evidence, continuation facts | blocks adoption if missing or contradicted |
| Raw local artifacts | full trigger snapshot chunks, full git diff chunks, state evidence JSON | recovery and debugging outside the model prompt |
| Central performance index | ~/.config/pi/.scratch/slipstream-stats/sessions/<session-id>.jsonl rows with timings, judge score, and redacted/relative artifact path |
cross-session local performance audits |
Raw git diff text alone is not acceptance-blocking. If a patch detail matters for safe continuation, it should appear as a distilled fact, latest update, critical literal, or continuation-used fact.
Implementation choices
| Choice | Reason |
|---|---|
Greenfield package instead of patching pi-smart-compact |
The target behavior is not just better scoring; it is a different lifecycle: prepare, validate, repair, then explicitly adopt. |
| Local default compaction replacement | Plain /compact and threshold compaction now use Slipstream directly, so there is one active compaction path. |
| Prepared-summary fast path | If /slipstream compact --prepare has already validated a pending summary, session_before_compact consumes it immediately. |
| Direct hook generation fallback | If no pending summary exists, session_before_compact generates, judges, repairs, and returns a Slipstream summary; final rejection is policy-accepted with score/artifacts. |
| Final-assistant-boundary auto trigger | Background preparation starts only after final assistant responses, avoiding orphaned tool-result boundaries. |
| Full raw artifacts + bounded prompt evidence | Large transcripts and diffs must remain recoverable without sending megabytes to the model. |
| Deterministic current-state capsule before model summary | Critical latest-state scaffolding is prepended by code instead of being optional model prose. |
| Deterministic manifest before model summary | Trust extracted file/error/decision/literal facts more than prose guesses. |
| Policy-accept instead of native fallback | A rejected Slipstream summary should not silently degrade to the compactor it is intended to replace; rejected acceptance is explicit, scored, and marked in compaction details. |
| Central session stats instead of artifact-local stats | Future weekly/all-session reviews can scan one central directory; paths are redacted/relative unless statsFullPaths is explicitly enabled. No per-turn writes or extra model calls are added. |
Testing and evaluation
Current local verification:
(cd packages/pi-slipstream-compact && npm run check)
Latest result:
- 231 Node test-runner tests passed.
tsc --noEmitpassed.- Post-fix 2026-05-29 fresh-agent continuation validation on 11 clean overlapping cases: Slipstream scored
9.36/10, success rate1.0, stale-state score10.0, and failure modesnone:11; native scored5.36/10, success0.45; benchmark Codex prompt scored3.00/10, success0.18. - Full 2026-05-29 blinded continuation-probe benchmark: Slipstream scored
9.00/10and won64/66decisions against native Pi compaction and a benchmark-only Codex prompt baseline; the remaining decisions were 1 native win and 1 tie. - Focused rerun of the only weak full-run case: Slipstream scored
9.00/10and won3/3decisions.
Evaluation performed so far:
| Evaluation | Result |
|---|---|
| Unit tests | Config normalization, snapshot extraction, artifact writing, summary/judge/repair, pending state, commands, model setup, auto lifecycle, state evidence. |
| Manual live Pi flow | Prepared/adopted hook compaction; post-compaction no-tools answer preserved files, sentinels, passing command, and intentional failing command. |
| Auto live Pi flow | Auto prepared at final assistant boundary; second-turn continuation finalizes/adopts only after freshness and idle checks. |
| Idle auto lifecycle regression tests | Idle-without-next-turn finalization can proceed while idle; if the branch advanced beyond incomplete continuation evidence, idle revalidation runs before adoption. |
| Progress repaint/regression tests | Long-running progress phases keep elapsed timer updates in the widget without ticking the footer/status line every second; stale progress owners cannot overwrite newer widget phases, shutdown and compaction teardown cancel active progress timers, command progress cannot preempt lifecycle progress, and repair score labels use the current judge score. |
| Repeated compaction | Second hook compaction preserved prior and new sentinels. |
| Natural scratch repo | Preserved basename-colliding skill paths, exact test strings, retryable error behavior, and passing npm test. |
| Forced rejection | High threshold rejected candidate; /slipstream compact --adopt refused with no compaction entry saved. |
| Large output | Preserved first/last/error sentinels and Pi output-log pointers after output truncation. |
| Past-session replay | Real long sessions stayed within prompt bounds during replay checks. |
| 2026-05-29 full LLM benchmark | 22 clean overlapping cases, 3 blinded judge replicates each. Slipstream scored 9.00/10 and won 64/66 decisions. |
| 2026-05-29 focused rerun | Reran the only weak full-run Slipstream case. Slipstream scored 9.00/10 and won 3/3 decisions. |
| 2026-05-29 fresh-agent continuation validation | Ran no-tool fresh-agent responses from compacted handoffs and judged downstream continuation behavior on 11 clean overlapping cases. Slipstream scored 9.36/10, success 1.0, stale-state 10.0, failure modes none:11. |
Caveat: the benchmark corpus is a local mix of private Pi sessions and some public SWE-bench-derived cases. The benchmarks measure continuation readiness with blinded LLM judging; they are not end-to-end SWE-bench scores or external task-completion scores.
Benchmark results
Latest primary result: docs/latest-full-benchmark-2026-05-29.md.
The reported benchmark data comes from a local mix of private Pi sessions and some public SWE-bench-derived cases. Raw sessions, prompts, artifacts, and benchmark outputs are not included in this package because they can contain private repository state, paths, tool output, and provider-bound prompts. Benchmark code is also not bundled in the npm package; it can be shared for review on request.
Current post-fix fresh-agent continuation validation on 11 clean overlapping cases:
| Method | Overall avg | Success rate | Stale-state score | Failure modes |
|---|---|---|---|---|
| Slipstream | 9.36/10 | 1.0 | 10.0/10 | none:11 |
native /compact |
5.36/10 | 0.45 | 5.18/10 | stale/latest-state issues |
| benchmark Codex prompt | 3.00/10 | 0.18 | 3.45/10 | latest-state/next-action failures |
Full 2026-05-29 blinded review benchmark:
| Method | Overall avg | Wins |
|---|---|---|
| Slipstream | 9.00/10 | 64/66 |
native /compact |
5.36/10 | 1/66 |
| benchmark Codex prompt | 4.24/10 | 0/66 |
Focused rerun of the only weak full-run Slipstream case:
| Method | Overall avg | Wins |
|---|---|---|
| Slipstream | 9.00/10 | 3/3 |
native /compact |
3.67/10 | 0/3 |
| benchmark Codex prompt | 0.00/10 | 0/3 |
Caveat: these benchmarks measure continuation readiness with blinded LLM judging. They are not end-to-end SWE-bench scores or external task-completion scores.
Privacy and security
Artifacts may contain:
- raw conversation snippets,
- tool outputs,
- absolute paths,
- git diffs,
- error messages,
- secrets accidentally present in the session or diff.
The default artifact root is .scratch/compactions, which should be gitignored. The configured artifact root must resolve inside the current project directory; absolute paths and .. escapes outside the project are rejected. Hidden thinking content is not copied into compaction text. Do not publish artifact directories.
Slipstream also sends compacted session evidence to the configured model provider for summary generation, judging, and repair. That evidence can include conversation text, tool output, commands, file paths, git status/diff excerpts, artifact references, and secrets that were already present in the session or diff. Do not enable this package on sensitive repositories unless you are comfortable with both local artifact storage and provider-bound model prompts containing that state.
Limitations
- Experimental package; use Evaluate safely before relying on it for important sessions.
- Background preparation is enabled by default; set
autoTrigger: falseto turn it off while keeping manual/compacton Slipstream. - Automatic prepared-summary adoption uses Pi's idle signal; the package does not patch Pi private scheduler methods.
- Full diff preservation is capped by artifact byte limits.
- Quality depends on the configured summary and judge models.
- The judge validates continuation sufficiency; it does not prove the underlying code is correct.
- Historical replay checks prompt size and manifest extraction, not end-to-end human task success.
- Artifact paths are local and may become stale if scratch directories are cleaned.
- Judge-rejected summaries are policy-accepted with score and artifacts; model/tool failures or missing compaction boundaries can still fail, but the package never falls back to native compaction automatically.
Development
cd packages/pi-slipstream-compact
npm test
npm run typecheck
npm run check
CI runs npm ci, npm run check, and npm pack --dry-run --json on pushes to main, pull requests, and manual dispatches.
Release
Publishing is tag-driven. The Publish workflow runs only for tags matching v*.*.*, verifies that the tag exactly matches package.json (v${version}), runs the full check suite, verifies the npm tarball contents, then publishes with npm provenance:
npm version patch --no-git-tag-version
# review package.json/package-lock.json, commit, then tag the commit as vX.Y.Z
One-time npm setup: configure npm trusted publishing for OrestesK/pi-slipstream-compact with provider GitHub Actions and workflow filename publish.yml. The workflow uses OIDC (id-token: write) and does not require an NPM_TOKEN secret.
Package layout:
src/
index.ts # Pi extension entrypoint
commands.ts # /slipstream command handling
auto.ts # automatic trigger/finalization lifecycle
pipeline.ts # dry-run and validated compaction pipeline
snapshot.ts # deterministic session manifest extraction
state-evidence.ts # read-only git/session evidence collection
artifact-store.ts # local artifact writing and indexing
summary.ts # summary prompt
judge.ts # judge prompt and acceptance rules
repair.ts # full-summary rewrite repair prompt
session-state.ts # pending adoption state
model.ts # Pi model completer integration
Roadmap
- More long natural coding-session evaluations.
- Publishable npm metadata, screenshots, and package-gallery image/video.
- Optional
checkpoint_contextcommand/tool for Focus-style semantic boundaries. - ACON-style learning loop over failed compacted-vs-full continuations.
- Better ranking of critical literals when sessions contain more than the current cap.
- Optional retrieval command for opening exact artifact chunks from a compacted summary.
Bottom line
pi-slipstream-compact is built around one principle: do not trust a compaction summary until it proves it can support what the agent needs next.