pi-slipstream-compact

Slipstream-style validated compaction extension for Pi Coding Agent

Packages

Package details

extension

Install pi-slipstream-compact from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-slipstream-compact

Package: pi-slipstream-compact
Version: 0.1.4
Published: Jun 13, 2026
Downloads: 369/mo · 106/wk
Author: orestesk
License: MIT
Types: extension
Size: 310.7 KB
Dependencies: 0 dependencies · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./src/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-slipstream-compact

Safer compaction for long Pi Coding Agent sessions.

Long Pi sessions can lose important details when the context gets compacted: files you changed, commands that failed, decisions you made, blockers, and what should happen next. This package makes Pi check the summary before using it.

Inspired by the Slipstream research paper, adapted for real Pi coding sessions.

Status: experimental. Background preparation is enabled by default. Safe evaluation commands are available before relying on it for important sessions.

Why use this package

Install it, then keep using Pi normally.

As the session grows, Slipstream can prepare a summary in the background.
When you run /compact, Pi uses Slipstream's reviewed summary instead of native compaction.
Automatic compaction uses the same review step.
If the summary misses important state, Slipstream tries to repair it before compaction.
Recovery artifacts are saved under .scratch/compactions in case you need to inspect what happened.

Install

pi install npm:pi-slipstream-compact

Or install from GitHub:

pi install git:github.com/OrestesK/pi-slipstream-compact@v0.1.0

Pi packages run with full local permissions. Review source before installing packages from npm, git, or another machine.

This package can start background model calls near context limits and can send session evidence, tool output, file paths, and git excerpts to your configured model provider. If you want to inspect first, use Evaluate safely or set autoTrigger: false before relying on it.

How to use it

After installing, keep using Pi normally.

You want	Do this
Let compaction happen automatically	Nothing. Background preparation is enabled by default and starts when the session gets large.
Compact manually	Run `/compact`.
See Slipstream state	Watch the compact widget above the prompt while Slipstream is active, or run `/slipstream status` for details.
Find recovery artifacts	Run `/slipstream artifacts`.
Turn off background preparation while testing	Set `autoTrigger: false`; `/compact` still uses Slipstream while the package is enabled.
Keep another extension or native Pi owning `/compact`	Set `replaceDefaultCompact: false`; use `/slipstream compact` only when you explicitly want Slipstream.

Before using it on a real repository, make sure .scratch/ is gitignored. Slipstream writes local recovery artifacts under .scratch/compactions.

Default behavior and settings

Default config is intentionally small:

{
	"pi-slipstream-compact": {
		"enabled": true,
		"autoTrigger": true,
		"artifactRoot": ".scratch/compactions"
	}
}

Important defaults:

Setting	Default	Meaning
`enabled`	`true`	Enables background preparation and `/compact` replacement. Support commands remain available.
`autoTrigger`	`true`	Starts preparing a checked summary in the background when the session gets large.
`replaceDefaultCompact`	`true`	Makes plain `/compact` use Slipstream by default; set `false` for side-by-side mode.
`triggerContextPercent`	`0.6`	Starts/latches auto compaction around 60% context usage.
`judgeThreshold`	`7`	Minimum continuation-quality score before normal acceptance.
`repairAttempts`	`3`	Tries full-summary repair after judge rejection.
`rejectedSummaryMode`	`"ask"`	Shows an interactive decision when possible; accepts on timeout/no UI unless explicitly rejected.
`artifactRoot`	`.scratch/compactions`	Local recovery artifact directory inside the current project.
`statsFullPaths`	`false`	Central stats redact paths by default; set `true` only for explicit local debugging.
`summaryModel`	active model	Uses your active Pi model unless overridden.
`judgeModel`	active model	Uses your active Pi model unless overridden.

See Full configuration for all settings and model overrides.

Evaluate safely

You do not need these commands for normal use. They are for checking a new install before relying on it.

A safe evaluation ladder:

Inspect the prompt and local evidence without judging or compacting:
```
/slipstream compact --dry-run
```
Inspect candidate-prompt.md, state-evidence.json, and git artifacts for stale state, missing blockers, or sensitive data.
Inspect a judged summary before applying it:
```
/slipstream compact --prepare
```
This writes candidate-summary.md and judge.json.
Apply the prepared summary if it is still fresh:
```
/slipstream compact --adopt
```

Prepared summaries expire after pendingTtlMs (default: 5 minutes) and are rejected if the session branch advances too far. Old candidate-summary.md and judge.json files are still useful for inspection, but they are not enough for /slipstream compact --adopt; rerun --prepare if the pending summary expired.

Native compact vs Slipstream

Area	Native `/compact`	`pi-slipstream-compact`
Main path	One summarization pass.	Generate, validate, repair if needed, then adopt.
Current-state fidelity	Can lose exact latest files, errors, or decisions.	Prepends deterministic current-state facts and validates them before adoption.
Stale-state protection	Can preserve obsolete next actions or miss the latest turn.	Checks latest exchange, continuation evidence, and branch freshness.
Recovery	Summary text is usually the main surviving artifact.	Writes local snapshots, state evidence, git evidence, prompts, judge results, and adoption metadata.
Best use case	Shorter or low-stakes sessions where speed/cost matter most.	Long coding sessions where losing exact state is expensive.

Local validation so far shows the difference this package is trying to optimize for: the latest fresh-agent continuation validation scored the Slipstream path at 9.36/10 versus native /compact at 5.36/10 on 11 clean overlapping cases. See Benchmark results for data and caveats.

Tradeoffs

Tradeoff	What it means
More model calls	A validated compaction normally uses at least a summary call and a judge call; rejected candidates can add repair calls. Background preparation can spend these calls before you manually ask to compact.
More latency	Manual compaction is slower than native summarization because it gathers evidence, writes artifacts, judges, and may repair. Background preparation can hide some latency by starting earlier.
More provider-bound context	Summary, judge, and repair prompts can include conversation text, tool output, paths, git excerpts, and artifact references. Do not use it on repositories where that provider exposure is unacceptable.
Local artifact footprint	Recovery artifacts are written under `.scratch/compactions`; they may contain sensitive paths, diffs, commands, or outputs and must stay gitignored.
Experimental behavior	The judge improves continuation readiness but does not prove code correctness or external task success. Start with Evaluate safely on new repositories.

If you mostly run short sessions, care more about minimizing cost than preserving exact state, or cannot send session evidence to the configured model provider, native compaction is probably the better default; disable the package to use native compaction.

What this is based on

The core idea comes from Slipstream-style validation: write a shorter summary while the session continues, then check whether that summary still supports the next continuation. This package adapts that idea for Pi coding sessions by adding file/error/decision tracking, git/session artifacts, stale-state checks, and explicit rejected-summary policy.

It also borrows narrower ideas from active context compression, subgoal-style task state, and external evidence stores: compact before the window is exhausted, preserve completed-work/current-state structure, and keep raw recovery evidence outside the prose summary. The detailed source mapping is in Research and related ideas.

Failure modes this targets

Long coding-agent sessions fail in boring, expensive ways after compaction:

exact file paths collapse into ambiguous basenames,
recent test failures or tool errors disappear,
user decisions and constraints get paraphrased into something weaker,
the agent forgets what was modified, verified, or still blocked,
a summary sounds plausible but cannot support the next few turns of work.

Research and related ideas

The direct inspiration is Slipstream: generate a compacted handoff, then validate it against continuation evidence before adopting it.

The other references are related ideas, not full implementations in this package.

Source	Relationship to this package
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents and repo	Core basis: checked compaction before adoption. This package adapts it for Pi's files, tool output, git state, errors, and user decisions.
Active Context Compression	Related idea: manage context before the window is exhausted instead of waiting until compaction is urgent.
HiAgent	Related idea: long tasks need durable task/subgoal state, not just a flat transcript summary.
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory	Related idea: keep recoverable evidence outside the prose summary. This package does not implement indexed memory or retrieval learning.
DeepAgents	Related implementation pattern: agents can keep external files/context outside the main prompt. This package uses local artifacts instead.
ACON	Future direction: learn better compression policies from failed compacted-vs-full continuations.

How it works

Pi session grows
      │
      ▼
/compact, automatic threshold compaction, or /slipstream compact
      │
      ├─ freeze the old state
      │    files, errors, decisions, constraints, latest exchange,
      │    changed paths, verification evidence, and critical literals
      │
      ├─ collect recovery evidence
      │    local artifacts, git status, and git diff evidence
      │
      ├─ generate a compacted summary
      │    focused on current state and next steps
      │
      ├─ validate the summary
      │    compare it with the task state, recent continuation, and artifacts
      │
      ├─ repair if needed
      │    ask the model to rewrite missing or weak parts
      │
      ├─ check freshness
      │    reject or revalidate if newer messages appeared after validation
      │
      └─ compact with the accepted summary
           accepted summaries are scored; rejected summaries are explicit

The important difference from a normal summarizer: adoption is validated and scored. Accepted summaries are marked by the judge; rejected summaries are handled by an explicit rejected-summary policy instead of silently falling back to native compaction. In reject mode, rejected summaries cancel compaction. In ask mode, interactive direct/default compaction shows a dialog with score, diagnosis, missing facts, contradictions, artifact directory, and summary preview; any no-UI/false/timeout result falls back to policy acceptance instead of cancellation. In accept mode, rejected summaries are accepted directly with rejectedSummaryAccepted: true, score, judge diagnostics, and artifact links in compaction details.

Features

Plain /compact and automatic threshold compaction use validated Slipstream compaction by default.
/slipstream support commands for status, artifact inspection, dry-run, and prepare/adopt evaluation.
Automatic trigger, enabled by default and configurable with autoTrigger.
Slipstream-style continuation validation before adoption.
Deterministic manifest extraction for files, errors, decisions, constraints, open loops, verification evidence, latest compacted updates, retained-tail current-state anchors, latest user/assistant exchange state, conservative stale/superseded signals, and critical literals.
Local artifact store under .scratch/compactions, with cooperative chunked trigger snapshot writes so large raw recovery artifacts do not require one giant foreground JSON serialization.
Central per-session performance stats under ~/.config/pi/.scratch/slipstream-stats/sessions/<session-id>.jsonl: mode, outcome, timing buckets, judge score, tokens before compaction, and redacted/relative artifact path by default.
Full compaction-time git diff preservation as chunked artifacts when below artifact byte caps, while keeping model-visible diff text bounded.
Explicit rejection path: rejected summaries are accepted by policy with score, judge diagnostics, artifacts, and rejectedSummaryAccepted: true; ask mode shows a scored confirmation dialog first when UI is available, and expert --prepare summaries are recoverable from pending.json if runtime state resets before --adopt.
Adoption-time freshness guard: pending summaries store validatedThroughEntryId; expert --adopt and auto activation revalidate against the current branch if newer messages appeared after preparation, while default /compact ignores stale pending state and generates a fresh summary instead of adopting it. Consumed pending artifacts are cleared so an old pending.json cannot be replayed after compaction.
Bounded Session Findings summary section for durable source-grounded facts that are useful later but are not the immediate next action.
TypeScript package manifest for Pi extension loading.

Local development install

From this repository:

cd packages/pi-slipstream-compact
pi -e .

Or add the local package path to Pi settings:

{
	"packages": ["/absolute/path/to/pi-slipstream-compact"]
}

You can also persist the local package with Pi's installer:

pi install /absolute/path/to/pi-slipstream-compact

Support commands

/slipstream status
/slipstream artifacts
/slipstream compact
/slipstream compact --dry-run
/slipstream compact --prepare
/slipstream compact --adopt

Command	Effect
`/slipstream status`	Shows idle/running/pending/failed state and pending judge details.
`/slipstream artifacts`	Shows the latest artifact directory remembered by the current session, if any. If it shows nothing after a restart, browse `.scratch/compactions/` directly.
`/slipstream compact`	Generates, reviews, and immediately queues Slipstream compaction.
`/slipstream compact --dry-run`	Writes artifacts and a candidate prompt without changing compaction state.
`/slipstream compact --prepare`	Expert mode: generates, judges, possibly repairs, and stores a validated pending summary without applying it.
`/slipstream compact --adopt`	Expert mode: calls Pi compaction only if a validated pending summary exists; revalidates first if the branch advanced since preparation.

Unknown flags or positional arguments on compact are rejected instead of being ignored.

Local rollout:

Plain Pi /compact and threshold compaction now use Slipstream by default through session_before_compact.
/slipstream compact remains the explicit one-command support path.
--dry-run writes prompts and artifacts without changing session state.
--prepare and --adopt remain expert inspect-before-apply workflows; rerun --prepare if the pending summary expires or becomes stale.
Rejected Slipstream summaries are policy-accepted with score and artifacts instead of falling back to native compaction; ask mode shows a scored confirmation dialog when UI is available, accepts on timeout/no response, and rejects only when the user explicitly selects Reject.

Full configuration

Configure in ~/.pi/agent/settings.json or project .pi/settings.json. The canonical settings key is "pi-slipstream-compact"; the older "slipstreamCompact" key is also accepted for compatibility.

Default-style configuration:

{
	"pi-slipstream-compact": {
		"enabled": true,
		"autoTrigger": true,
		"artifactRoot": ".scratch/compactions"
	}
}

Disable background preparation:

{
	"pi-slipstream-compact": {
		"autoTrigger": false
	}
}

This disables background preparation only. Plain /compact still uses Slipstream while the package is enabled.

Run side-by-side with native Pi or another extension owning plain /compact:

{
	"pi-slipstream-compact": {
		"replaceDefaultCompact": false
	}
}

This also disables Slipstream auto-triggering. Explicit /slipstream compact, /slipstream compact --prepare, and /slipstream compact --adopt remain available. If you set replaceDefaultCompact: false, autoTrigger is normalized to false even when configured as true.

Disable Slipstream compaction replacement entirely:

{
	"pi-slipstream-compact": {
		"enabled": false
	}
}

Support commands remain registered, but lifecycle hooks and /compact replacement are disabled.

Tuned local configuration example:

{
	"pi-slipstream-compact": {
		"enabled": true,
		"autoTrigger": true,
		"triggerContextPercent": 0.6,
		"minContinuationTurns": 1,
		"maxContinuationTurns": 4,
		"judgeThreshold": 7,
		"repairAttempts": 3,
		"rejectedSummaryMode": "ask",
		"pendingTtlMs": 300000,
		"artifactRoot": ".scratch/compactions",
		"statsFullPaths": false
	}
}

Setting	Default	Meaning
`enabled`	`true`	Enables lifecycle hooks and `/compact` replacement. `/slipstream` support commands remain registered even when disabled.
`autoTrigger`	`true`	Starts background summary preparation near context pressure. Set `false` to disable background preparation. Forced off when `replaceDefaultCompact` is `false`, including when explicitly configured as `true`.
`replaceDefaultCompact`	`true`	When `true`, plain Pi `/compact` and threshold compaction use Slipstream. When `false`, plain/default compaction is left to Pi or another extension, while explicit `/slipstream compact` and `--adopt` still use Slipstream.
`triggerContextPercent`	`0.6`	Single context-pressure threshold for starting background preparation and latching compaction urgency. Fresh validated summaries compact when Pi reports the session is idle; stale summaries are revalidated before adoption. Legacy `softContextPercent`/`hardContextPercent` are accepted as aliases but should not be used for new config.
`minContinuationTurns`	`1`	Preferred continuation turns before turn-boundary auto validation. If Pi is already idle after the background summary resolves, auto finalization may proceed with fewer turns instead of waiting forever.
`maxContinuationTurns`	`4`	Maximum continuation turns collected for auto validation when later turns arrive.
`judgeThreshold`	`7`	Minimum accepted strict continuation-quality judge score. The judge prompt itself rejects safe-but-weak summaries for repair unless they are production-ready durable handoffs.
`repairAttempts`	`3`	Summary-model full-rewrite repair attempts after strict judge rejection. Empty/heading-only repair outputs are skipped without replacing the prior substantive candidate, and remaining attempts continue.
`rejectedSummaryMode`	`"ask"`	Rejected-summary handling after repairs fail: `"ask"` shows score/diagnostics/summary preview when UI selection is available and accepts on timeout/no response unless the user explicitly rejects, `"reject"` cancels compaction, and `"accept"` accepts immediately.
`pendingTtlMs`	`300000`	Expiry for a prepared pending summary.
`artifactRoot`	`.scratch/compactions`	Local artifact directory, resolved against Pi's current project cwd; paths outside the project are rejected, including existing symlinks that resolve outside the project.
`statsFullPaths`	`false`	Central performance stats store `cwd: "."` and relative/redacted artifact paths by default. Set `true` only when you explicitly want full local paths in `~/.config/pi/.scratch/slipstream-stats`.
`summaryModel`	active model	Optional `provider/model-id` override for summary generation.
`judgeModel`	active model	Optional `provider/model-id` override for judging.

The judge uses a strict continuation-probe rubric for current state, next actions, constraints, risk and verification awareness, artifact grounding, retrievability, knowledge continuity, stale-state suppression, and low-noise/non-contradiction. Safe-but-weak summaries are rejected for repair instead of passing through a second critic.

Optional model override example:

{
	"pi-slipstream-compact": {
		"summaryModel": "openai/gpt-4.1",
		"judgeModel": "openai/gpt-4.1"
	}
}

Pi lifecycle integration

The extension registers one command namespace and a small set of lifecycle handlers:

Pi surface	Package behavior
`registerCommand("slipstream", ...)`	Provides `/slipstream status`, `artifacts`, one-command `compact`, `compact --dry-run`, `compact --prepare`, and `compact --adopt`.
`turn_end`	At `triggerContextPercent`, starts candidate generation after final assistant responses, latches compaction urgency, collects continuation evidence when later turns arrive, revalidates stale pending summaries against the live branch head, and activates only fresh pending summaries with `ctx.compact()`.
`session_before_compact`	Uses a current Slipstream-requested pending summary if available; otherwise generates, judges, and returns a fresh Slipstream summary as the default compaction replacement. With `replaceDefaultCompact: false`, plain/default compaction returns to Pi or another extension unless `/slipstream compact` explicitly requested adoption.
`session_start` / `session_shutdown`	Keeps the compact widget hidden while idle, clears it on shutdown, updates lightweight status, and clears in-memory background state.

The normal manual path is plain Pi /compact or /slipstream compact when replaceDefaultCompact is enabled, and explicit /slipstream compact only when side-by-side mode is enabled. The automatic path uses one threshold: begin preparing at triggerContextPercent, keep collecting continuation evidence when it arrives, and compact only when the pending summary is validated through the current branch head and Pi reports an idle runtime with no queued messages. If no later turn arrives after the background summary resolves, Slipstream can still proceed through finalizing, judging, repair, pending-summary creation, and idle adoption instead of waiting forever. If the auto pending summary is stale, Slipstream revalidates it with a fresh retained-tail boundary instead of blocking on old work, adopting stale state, or keeping an old oversized tail. If timing still reaches Pi's own model-limit compaction, Slipstream generates/judges directly in session_before_compact unless replaceDefaultCompact is disabled. The expert prepare/adopt split still exists when you want to inspect a validated pending summary before applying it.

Integration API

Other extensions can reuse Slipstream-style validation without installing Slipstream as the default /compact owner:

import { slipstreamStyleValidateAndRepair } from "pi-slipstream-compact/integration-api";

const result = await slipstreamStyleValidateAndRepair({
	candidate,
	sourceEvidence: {
		sourceMessageExcerpts,
		filesModified,
		unresolvedErrors,
		userDecisions,
		constraints,
	},
	continuation,
	completeText,
	config: { judgeThreshold: 7, repairAttempts: 1 },
});

The integration API is deliberately narrower than the full extension lifecycle. It judges and optionally repairs a caller-provided candidate summary using caller-provided evidence and a caller-provided completeText function. It does not call ctx.compact(), register commands, manage pending state, update widgets, write local artifacts, or write central stats. The result includes the final summary, accepted, repaired, repairCount, top-level score/diagnostics, and the underlying JudgeResult.

Use this when another extension already owns candidate generation and wants a Slipstream-style quality gate. Use the full package lifecycle when you want Slipstream to own candidate generation, evidence collection, repair, freshness checks, artifacts, and adoption.

Progress visibility

The package shows a compact Slipstream widget above the prompt only while Slipstream is actively preparing, compacting, or holding a prepared pending summary. It is hidden while idle. The widget stays short: exact current stage plus elapsed time for active work, and judge score when available. In the interactive TUI, it uses Pi theme colors; in plain/RPC contexts, it falls back to text.

Widget contents are intentionally limited to actionable stage labels:

Internal state	Widget text example	Why this is enough
snapshot	`Slipstream: snapshotting local state · 3s`	Shows local synchronous work separately from model calls.
artifacts	`Slipstream: writing artifacts · 1s`	Shows local artifact writes.
state evidence	`Slipstream: collecting evidence · 2s`	Shows bounded read-only git/session evidence collection.
summary	`Slipstream: summarizing · 38s`	Shows the summary model call is running.
waiting for auto summary	`Slipstream: waiting for auto summary · 12s`	Auto finalization is waiting for the background summary before judging.
judging	`Slipstream: checking summary · 21s`	Shows the judge model call is running.
repairing	`Slipstream: repairing summary · 74s · last score 4/10`	The score explains why repair is running without implying the repair output has been judged.
accepted/current pending	`Slipstream: ready · score 9/10`	The score is current and actionable.
rejected	`Slipstream: summary rejected · score 4/10`	The low score is the concise reason this is rejected; detailed diagnosis stays in warning/status text.
applying compaction	`Slipstream: compacting · score 9/10`	Pi is applying an already judged prepared summary.
idle	hidden	No active action.

If Pi is closed while a prepared pending summary exists, startup recovery checks the persisted pending.json. The widget is restored as Slipstream: ready · score N/10 only when the recovered summary still matches the same session, cwd, TTL, and current branch head; stale or expired pending summaries stay hidden.

Routine progress stays visible without chat-style progress spam: the widget shows active stage and elapsed time, while the footer/status line changes only when the progress phase or message changes so timer ticks do not repaint both UI regions every second. Active lifecycle progress owns a disposable widget controller, so shutdown and compaction teardown can cancel timers instead of leaving stale owners behind. Lifecycle progress can preempt older lifecycle progress; support-command progress does not preempt an active lifecycle owner. Interactive rejected-summary decisions use the confirmation/select UI when rejectedSummaryMode is "ask"; non-interactive accepted/rejected outcomes are recorded in compaction details or a concise warning.

Current compaction mode

The package now supports one runtime strategy across manual, hook, and auto paths: continuation-first Slipstream replacement. The former fact-ledger route was removed because it did not earn its extra model calls, lifecycle state, prompts, tests, and user-facing complexity.

The former --high-accuracy mode was removed because it added broad chunk evidence without a global prompt budget and could overload the summary model.

A future slower path should be targeted refinement: start from the normal candidate, use the judge to identify missing/risky facts, then retrieve only the evidence needed for those gaps.

Artifact model

Artifacts are local recovery evidence, not decorative logs. A typical validated run directory contains:

.scratch/compactions/<session-id>-<run-id>/
  run.json
  index.json
  trigger-snapshot.json
  trigger-raw-001.json
  state-evidence.json
  git-status.txt
  git-diff-stat.txt
  git-diff-full-001.patch
  git-snapshot.json
  candidate-summary.md
  judge.json
  continuation.json
  adoption.json

Dry-run directories write candidate-prompt.md instead of candidate-summary.md and do not write judge.json, continuation.json, or adoption.json.

git-snapshot.json records:

status path,
diff-stat path,
full diff chunk paths,
full diff SHA-256,
full diff byte count,
whether git diff collection completed,
whether full preservation succeeded.

By default, artifact chunks are 512 KiB and a single artifact payload is capped at 96 MiB. Trigger snapshot chunks are written cooperatively from JSON fragments instead of first building one full raw JSON string and buffer, so large local recovery snapshots should yield back to Pi's event loop during artifact preparation. They can still consume disk and CPU proportional to transcript size; the goal is responsiveness, not zero local cost.

If the full git diff is larger than the cap, or if git diff collection reports truncation or another error, the package writes an omission/incomplete note instead of pretending the stored bytes are a complete diff. Summary prompts are capped by reducing model-visible conversation text; if protected fixed sections alone exceed the prompt cap, Slipstream fails fast with artifacts instead of sending an oversized prompt.

Evidence semantics

The package separates evidence into three levels:

Level	Examples	Used for
Model-visible writer grounding	bounded git diff, manifest facts, artifact paths	helps the summary writer produce a better candidate
Judge-protected facts	files, errors, constraints, decisions, latest updates, critical literals, verification evidence, continuation facts	blocks adoption if missing or contradicted
Raw local artifacts	full trigger snapshot chunks, full git diff chunks, state evidence JSON	recovery and debugging outside the model prompt
Central performance index	`~/.config/pi/.scratch/slipstream-stats/sessions/<session-id>.jsonl` rows with timings, judge score, and redacted/relative artifact path	cross-session local performance audits

Raw git diff text alone is not acceptance-blocking. If a patch detail matters for safe continuation, it should appear as a distilled fact, latest update, critical literal, or continuation-used fact.

Implementation choices

Choice	Reason
Greenfield package instead of patching `pi-smart-compact`	The target behavior is not just better scoring; it is a different lifecycle: prepare, validate, repair, then explicitly adopt.
Local default compaction replacement	Plain `/compact` and threshold compaction now use Slipstream directly, so there is one active compaction path.
Prepared-summary fast path	If `/slipstream compact --prepare` has already validated a pending summary, `session_before_compact` consumes it immediately.
Direct hook generation fallback	If no pending summary exists, `session_before_compact` generates, judges, repairs, and returns a Slipstream summary; final rejection is policy-accepted with score/artifacts.
Final-assistant-boundary auto trigger	Background preparation starts only after final assistant responses, avoiding orphaned tool-result boundaries.
Full raw artifacts + bounded prompt evidence	Large transcripts and diffs must remain recoverable without sending megabytes to the model.
Deterministic current-state capsule before model summary	Critical latest-state scaffolding is prepended by code instead of being optional model prose.
Deterministic manifest before model summary	Trust extracted file/error/decision/literal facts more than prose guesses.
Policy-accept instead of native fallback	A rejected Slipstream summary should not silently degrade to the compactor it is intended to replace; rejected acceptance is explicit, scored, and marked in compaction details.
Central session stats instead of artifact-local stats	Future weekly/all-session reviews can scan one central directory; paths are redacted/relative unless `statsFullPaths` is explicitly enabled. No per-turn writes or extra model calls are added.

Testing and evaluation

Current local verification:

(cd packages/pi-slipstream-compact && npm run check)

Evaluation	Result
Unit tests	Config normalization, snapshot extraction, artifact writing, summary/judge/repair, pending state, commands, model setup, auto lifecycle, state evidence.
Manual live Pi flow	Prepared/adopted hook compaction; post-compaction no-tools answer preserved files, sentinels, passing command, and intentional failing command.
Auto live Pi flow	Auto prepared at final assistant boundary; second-turn continuation finalizes/adopts only after freshness and idle checks.
Idle auto lifecycle regression tests	Idle-without-next-turn finalization can proceed while idle; if the branch advanced beyond incomplete continuation evidence, idle revalidation runs before adoption.
Progress repaint/regression tests	Long-running progress phases keep elapsed timer updates in the widget without ticking the footer/status line every second; stale progress owners cannot overwrite newer widget phases, shutdown and compaction teardown cancel active progress timers, command progress cannot preempt lifecycle progress, and repair score labels use the current judge score.
Repeated compaction	Second hook compaction preserved prior and new sentinels.
Natural scratch repo	Preserved basename-colliding skill paths, exact test strings, retryable error behavior, and passing `npm test`.
Forced rejection	High threshold rejected candidate; `/slipstream compact --adopt` refused with no compaction entry saved.
Large output	Preserved first/last/error sentinels and Pi output-log pointers after output truncation.
Past-session replay	Real long sessions stayed within prompt bounds during replay checks.
2026-05-29 full LLM benchmark	22 clean overlapping cases, 3 blinded judge replicates each. Slipstream scored `9.00/10` and won `64/66` decisions.
2026-05-29 focused rerun	Reran the only weak full-run Slipstream case. Slipstream scored `9.00/10` and won `3/3` decisions.
2026-05-29 fresh-agent continuation validation	Ran no-tool fresh-agent responses from compacted handoffs and judged downstream continuation behavior on 11 clean overlapping cases. Slipstream scored `9.36/10`, success `1.0`, stale-state `10.0`, failure modes `none:11`.

Benchmark results

Latest primary result: docs/latest-full-benchmark-2026-05-29.md.

The reported benchmark data comes from a local mix of private Pi sessions and some public SWE-bench-derived cases. Raw sessions, prompts, artifacts, and benchmark outputs are not included in this package because they can contain private repository state, paths, tool output, and provider-bound prompts. Benchmark code is also not bundled in the npm package; it can be shared for review on request.

Current post-fix fresh-agent continuation validation on 11 clean overlapping cases:

Method	Overall avg	Success rate	Stale-state score	Failure modes
Slipstream	9.36/10	1.0	10.0/10	`none:11`
native `/compact`	5.36/10	0.45	5.18/10	stale/latest-state issues
benchmark Codex prompt	3.00/10	0.18	3.45/10	latest-state/next-action failures

Full 2026-05-29 blinded review benchmark:

Method	Overall avg	Wins
Slipstream	9.00/10	64/66
native `/compact`	5.36/10	1/66
benchmark Codex prompt	4.24/10	0/66

Focused rerun of the only weak full-run Slipstream case:

Method	Overall avg	Wins
Slipstream	9.00/10	3/3
native `/compact`	3.67/10	0/3
benchmark Codex prompt	0.00/10	0/3

Caveat: these benchmarks measure continuation readiness with blinded LLM judging. They are not end-to-end SWE-bench scores or external task-completion scores.

Privacy and security

Artifacts may contain:

raw conversation snippets,
tool outputs,
absolute paths,
git diffs,
error messages,
secrets accidentally present in the session or diff.

The default artifact root is .scratch/compactions, which should be gitignored. The configured artifact root must resolve inside the current project directory; absolute paths and .. escapes outside the project are rejected. Hidden thinking content is not copied into compaction text. Do not publish artifact directories.

Slipstream also sends compacted session evidence to the configured model provider for summary generation, judging, and repair. That evidence can include conversation text, tool output, commands, file paths, git status/diff excerpts, artifact references, and secrets that were already present in the session or diff. Do not enable this package on sensitive repositories unless you are comfortable with both local artifact storage and provider-bound model prompts containing that state.

Limitations

Experimental package; use Evaluate safely before relying on it for important sessions.
Background preparation is enabled by default; set autoTrigger: false to turn it off while keeping manual /compact on Slipstream.
Automatic prepared-summary adoption uses Pi's idle signal; the package does not patch Pi private scheduler methods.
Full diff preservation is capped by artifact byte limits.
Quality depends on the configured summary and judge models.
The judge validates continuation sufficiency; it does not prove the underlying code is correct.
Historical replay checks prompt size and manifest extraction, not end-to-end human task success.
Artifact paths are local and may become stale if scratch directories are cleaned.
Judge-rejected summaries are policy-accepted with score and artifacts; model/tool failures or missing compaction boundaries can still fail, but the package never falls back to native compaction automatically.

Development

cd packages/pi-slipstream-compact
npm test
npm run typecheck
npm run check

CI runs npm ci, npm run check, and npm pack --dry-run --json on pushes to main, pull requests, and manual dispatches.

Release

Publishing is tag-driven. The Publish workflow runs only for tags matching v*.*.*, verifies that the tag exactly matches package.json (v${version}), runs the full check suite, verifies the npm tarball contents, then publishes with npm provenance:

npm version patch --no-git-tag-version
# review package.json/package-lock.json, commit, then tag the commit as vX.Y.Z

One-time npm setup: configure npm trusted publishing for OrestesK/pi-slipstream-compact with provider GitHub Actions and workflow filename publish.yml. The workflow uses OIDC (id-token: write) and does not require an NPM_TOKEN secret.

Package layout:

src/
  index.ts           # Pi extension entrypoint
  commands.ts        # /slipstream command handling
  auto.ts            # automatic trigger/finalization lifecycle
  pipeline.ts        # dry-run and validated compaction pipeline
  snapshot.ts        # deterministic session manifest extraction
  state-evidence.ts  # read-only git/session evidence collection
  artifact-store.ts  # local artifact writing and indexing
  summary.ts         # summary prompt
  judge.ts           # judge prompt and acceptance rules
  repair.ts          # full-summary rewrite repair prompt
  session-state.ts   # pending adoption state
  model.ts           # Pi model completer integration

Roadmap

More long natural coding-session evaluations.
Publishable npm metadata, screenshots, and package-gallery image/video.
Optional checkpoint_context command/tool for Focus-style semantic boundaries.
ACON-style learning loop over failed compacted-vs-full continuations.
Better ranking of critical literals when sessions contain more than the current cap.
Optional retrieval command for opening exact artifact chunks from a compacted summary.

Bottom line

pi-slipstream-compact is built around one principle: do not trust a compaction summary until it proves it can support what the agent needs next.