pi-until-done
Pi extension that brings Hermes Agent's /goal (Ralph loop with judge) to Pi as /until-done. Pi self-judges every turn, runs verifyCommand to confirm done, and routes all CI/CD through mise across 18 language profiles.
Package details
Install pi-until-done from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-until-done- Package
pi-until-done- Version
0.1.1- Published
- May 4, 2026
- Downloads
- not available
- Author
- kirensrinivasan
- License
- MIT
- Types
- extension, skill, prompt
- Size
- 1.7 MB
- Dependencies
- 1 dependency · 4 peers
Pi manifest JSON
{
"extensions": [
"./extensions/until-done.ts"
],
"skills": [
"./skills"
],
"prompts": [
"./prompts"
],
"image": "https://raw.githubusercontent.com/srinitude/pi-until-done/main/assets/preview.png"
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-until-done

A Pi extension that brings Hermes Agent's /goal
("the Ralph loop with a judge") to Pi as /until-done — and goes further by
letting Pi itself be the judge, using every Pi extension primitive,
and coexisting cleanly with every other extension.
Pi's own philosophy (from srinitude/pi-config): minimal core, extensible edges, deterministic, inspectable, preserve developer agency. This extension hews to that line. It composes; it does not override. State lives in session entries. The active model is the judge. No system-prompt replacement, no side-database, no hidden state.
Install
The package is on npm: https://www.npmjs.com/package/pi-until-done.
Through Pi (recommended)
pi install npm:pi-until-done # from npm (recommended)
pi install github:srinitude/pi-until-done # from git
pi install /path/to/pi-until-done # local install
pi -e /path/to/pi-until-done/extensions/until-done.ts # try without installing
The package manifest declares all four pi.dev resource types
(pi.extensions, pi.skills, pi.prompts, pi.image), so a single
pi install wires up everything.
Directly via your package manager
bun add pi-until-done # bun
npm install pi-until-done # npm
pnpm add pi-until-done # pnpm
yarn add pi-until-done # yarn
deno add npm:pi-until-done # deno
The runtime entrypoint is extensions/until-done.ts. No tools to
install separately — every CI command routes through
mise, which the extension assumes is already
on your PATH.
Requirements
- Pi >= 0.x (
pi --version) - Bun >= 1.2 (the runtime extensions load through)
- mise on PATH (used for every CI/CD invocation)
Use
/until-done finish migrating auth tests to Vitest
- Pi runs a PHASE 0 brainstorm — refines the goal type
(
ticketvs.exploratory), inventories accessible surfaces (logs, metrics, staging URLs, flame graphs, sandboxes), and nails down the verifyCommand. Sharp goals terminate cleanly; vague goals burn turns. - Pi drafts a contract — outcome, done-criteria,
verifyCommand(auto-wrapped withmise exec --if not already mise-routed), ask-before list, decision style, goalType, surfaces, startPhase — and shows it to you. - You approve via the dialog (or
/until-done autopilotto skip). - Pi calls
until_done_set+until_done_planand starts working in TDD-first mode: ANALYSIS → BOOTSTRAP → RED → GREEN → REFACTOR → CLEANUP (per pi-config). - After every turn, Pi self-judges. If done, it calls
until_done_completewith quoted output of the verifyCommand as evidence. If blocked,until_done_block. Phase transitions go throughuntil_done_progress({phase}). After complete, Pi callsuntil_done_distillto compile the journey into a PRD at.until-done/distilled.md. - When the budget (default 20 turns) is exhausted, the loop pauses and tells you exactly how to resume.
- Anything you type at any point preempts the loop. For non-preempting
side-questions, use
/until-done ask <question>.
The status line shows the live phase glyph:
◷ analysis— reading code⚙ bootstrap— validating infra✗ red— failing test exists✓ green— test passes↺ refactor— cleanup of structure⌫ cleanup— strip debug prints / scratch files before complete· none— research/doc goal
Subcommands
| Command | Purpose |
|---|---|
/until-done <intent> |
Start setup for a new goal |
/until-done status |
One-line current state |
/until-done detail |
Full contract overlay |
/until-done tasks |
Print the live YAML task list |
/until-done plan |
Show .until-done/tasks.yaml location |
/until-done northstar |
Print the locked goal contract |
/until-done replan-log |
Show every replan and its reason |
/until-done pause |
Halt continuation, keep state |
/until-done resume |
Resume + reset budget |
/until-done cancel |
Clear the goal |
/until-done budget <n> |
Change turn budget (1..20000; >500 prompts a confirm) |
/until-done ask <question> |
Side question — does not preempt the loop |
/until-done autopilot |
Skip the user-confirm dialog |
/until-done help |
Show this list |
Plus: --until-done "<intent>" CLI flag, Ctrl+G shortcut to redraw
the status widget, and prompts/until-done.md as a prompt-template
alias.
Tools (8)
| Tool | Purpose |
|---|---|
until_done_set |
Lock the North Star contract after user approval |
until_done_plan |
Provide the TDD-first task list (called once after set) |
until_done_replan |
Mid-execution restructuring — insert/remove/replace/split/merge/reorder |
until_done_task_update |
Patch a single task — status, learnings, gotchas, context |
until_done_progress |
Record a one-line progress note + optional phase transition |
until_done_complete |
Declare done — requires quoted verifyCommand output |
until_done_block |
Pause with a question for the user |
until_done_distill |
After complete: compile the journey into a PRD at .until-done/distilled.md |
Pi primitive coverage matrix
The brief was: use every Pi primitive, and have Pi call the shots.
Each row below maps a primitive to how /until-done uses it. Lines
marked no-op are intentionally inert — exercising a hook for its
own sake would violate Pi philosophy.
Hook events (29/29 addressed)
| Event | Mode | Why |
|---|---|---|
resources_discover |
active | Declare companion skills/ and prompts/ paths so the package is plug-and-play |
session_start |
active | Reconstruct goal state from custom entries; honor --until-done flag; warn if @qhn/pi-goal is also installed |
session_before_switch |
active | Confirm before leaving an active goal |
session_before_fork |
active | Three-way choice: carry/leave/cancel the fork |
session_before_compact |
active | Append goal context to compaction's customInstructions |
session_compact |
active | Re-anchor by emitting a verdict state event after compaction |
session_before_tree |
observed | Pi handles snapshotting; nothing to gate |
session_tree |
active | Full state reconstruction from new branch (todo.ts pattern) |
session_shutdown |
active | Clear status + widget keys cleanly |
context |
no-op | Pi philosophy: don't mutate LLM messages |
before_provider_request |
observed | Telemetry counter |
after_provider_response |
observed | Telemetry counter |
before_agent_start |
active | Append (never replace) a goal reminder block to the system prompt |
agent_start |
active | Reset per-iteration counters; set working-message to "pursuing: …" |
agent_end |
active | THE JUDGE STEP: budget check, spin-guard, queue continuation as user message |
turn_start |
active | Refresh status line |
turn_end |
active | Capture last assistant text snapshot |
message_start |
observed | Reserved hook |
message_update |
observed | Live status (rate-limited 500ms) |
message_end |
active | Capture finalized assistant text |
tool_execution_start |
observed | Tool-start counter |
tool_execution_update |
observed | Pi handles streaming UI |
tool_execution_end |
observed | Tool-end counter |
model_select |
observed | Telemetry only — judge model is whichever is active |
thinking_level_select |
observed | Telemetry counter |
tool_call |
active | POLICY GATE: enforce ask-before list against bash; tally progress signals per built-in tool |
tool_result |
observed | Reserved for future progress detection |
user_bash |
observed | Counter only — user-driven activity is allowed but doesn't count toward goal progress |
input |
active | Mark userMessagedThisTurn = true so agent_end skips auto-continuation when the user has spoken |
Built-in tool coverage (7/7 enumerated)
| Tool | How /until-done reasons about it |
|---|---|
read |
weak progress signal (+1) — investigation |
bash |
progress signal (+2) AND policy gate against ask-before |
edit |
strong progress signal (+3) — real change |
write |
strong progress signal (+3) — real change |
grep |
weak progress signal (+1) — search |
find |
weak progress signal (+1) — search |
ls |
weak progress signal (+1) — search |
If progressSignalsThisTurn === 0 at agent_end, /until-done enters
blocked with reason "spin guard" — the model literally did
nothing useful that turn.
Other Pi primitives addressed
| Primitive | Where |
|---|---|
pi.registerCommand |
/until-done with subcommand autocomplete |
pi.registerTool |
until_done_set, until_done_complete, until_done_block, until_done_progress |
pi.registerFlag |
--until-done <text> |
pi.registerShortcut |
Ctrl+G toggles the contract widget |
pi.registerMessageRenderer |
Custom render for until-done.continuation messages |
pi.appendEntry |
Persists until-done.state events (load/save) |
pi.sendUserMessage |
Continuation prompts + setup interview |
pi.sendMessage |
Continuation tick rendered in TUI |
pi.getCommands |
Detects @qhn/pi-goal collisions |
pi.getFlag |
Reads --until-done value |
ctx.ui.confirm/select/input/editor |
Setup confirmation, fork choice, ask-before, cancel |
ctx.ui.notify |
Status messages |
ctx.ui.setStatus |
Footer status line |
ctx.ui.setWidget |
Above-editor widget with full contract |
ctx.ui.setTitle |
Terminal title during pursuit |
ctx.ui.setWorkingMessage |
"pursuing: …" during streaming |
ctx.ui.custom |
Full contract overlay (/until-done detail) |
ctx.ui.theme.fg |
All UI color uses theme tokens |
ctx.sessionManager.getBranch |
State reconstruction from JSONL entries |
ctx.waitForIdle |
Setup flow waits for the assistant before opening confirm |
Skills (skills/until-done/SKILL.md) |
Loaded on demand to teach Pi the contract & tool protocol |
Prompt templates (prompts/until-done.md) |
Alternate invocation: /until-done as a template-style prompt |
Not used:
pi.registerProvider/unregisterProvider(the goal is active-model-as-judge, not a separate provider),pi.setActiveTools(would silently disable user tools — a Pi-philosophy violation),ctx.compact/fork/navigateTree/switchSession/newSession(those replace user state and must stay user-initiated). The extension intentionally leaves these on the table.
North Star + dynamic task list
The brief was: a fixed criterion to guide the entire process to a
clean end, but a task list that can be edited mid-flight when reality
diverges. /until-done separates the two:
Locked at until_done_set |
Mutable mid-execution | |
|---|---|---|
goal |
✓ | ✗ |
doneCriteria |
✓ | ✗ |
verifyCommand |
✓ | ✗ |
askBefore boundaries |
✓ | ✗ |
decisionStyle |
✓ | ✗ |
| Task list (insert/remove/split/merge/reorder/replace) | ✗ | via until_done_replan |
| Per-task: validationSteps, ciCommands, styleguideRules, guardrails | ✗ | via until_done_task_update |
| Per-task: status, learnings, gotchas, context refs | ✗ | via until_done_task_update |
phase |
✗ | via until_done_progress |
maxTurns |
✗ | via /until-done budget <n> |
The North Star (top block) is the fixed reference point. Pi can change
how it gets there but never where it's going. The only way to
change the North Star is /until-done cancel followed by a new
setup — by design, this requires fresh user approval.
Replan operations (until_done_replan)
| Op | Use when |
|---|---|
insert |
A new sub-task surfaced (insertAfter optional) |
remove |
A planned task is moot (must be pending/blocked; done is immutable) |
replace |
A pending task was specced wrong |
split |
One task is actually 2+ tasks |
merge |
Two+ tasks collapse into one |
reorder |
Dependencies need adjusting |
Every replan requires a non-empty reason which is appended to
affected tasks' learnings and to /until-done replan-log. Cycles are
rejected. The whole batch validates atomically — if one op is illegal,
none apply.
Live YAML on disk
After until_done_plan and every until_done_task_update /
until_done_replan, the extension rewrites .until-done/tasks.yaml
in the project root so humans can read the current state without
opening the TUI:
generated: 2026-05-04T12:34:56.000Z
goalId: ud-abc123
goal: finish migrating auth tests to Vitest
doneCriteria: bun test exits 0 with all auth specs green
verifyCommand: bun test
phase: green
askBefore: [git push]
budget: { used: 7, max: 20 }
currentTaskId: T-005
tasks:
- id: T-001
title: Bootstrap Vitest config
phase: bootstrap
status: done
dependencies: []
blocks: [T-002]
prerequisites: []
validationSteps:
- cat vitest.config.ts
- bun test --version
ciCommands: [bun test]
styleguideRules: []
guardrails: ["no new top-level deps without confirmation"]
learnings: ["replan: discovered tsconfig conflict"]
gotchas: ["forgot to update tsconfig include"]
context:
- path: package.json
why: read existing test script
- ...
Clean-end guarantee
When every planned task is done (or skipped) but Pi hasn't called
until_done_complete, the extension sends Pi exactly one structured
reminder per cycle:
All planned tasks are marked done. Two paths from here, pick one:
- Run
<verifyCommand>. If it passes, calluntil_done_complete.- If residual work surfaced, call
until_done_replanwith reasonresidual_work_discovered. Do not invent new work outside the plan.
After two such reminders, the loop pauses and yields to the user. The turn budget remains the absolute backstop.
Per-turn principle injection
Every turn, before_agent_start appends (never replaces) a composite
reminder block to the system prompt. Setup and the loop continuation
tick include the same blocks. Sources, in injection order:
- TDD discipline — RED → GREEN → REFACTOR → CLEANUP.
- Verifiability discipline — do not accept proxy signals; treat uncertainty as not achieved; quote command output as evidence.
- pi-config principles (extensions/lib/strings/principles/):
- Bootstrap mandate (the 8 automation-foundation items)
- Performance mandate (any unnecessary slowdown is a defect)
- Capability injection + test model (no internals, no shared state)
- Definition of done (stricter — both validation suites + parity)
- Working style (declare phase, never claim unverified)
- Mise-first CLI policy — every shell command via
mise runormise exec --.verifyCommandauto-wrapped onuntil_done_set. - Structural constraints — applies to every language Pi generates in: ≤3 nesting depth, ≤30 LOC per construct, ≤200 LOC per file.
- Plan management + tool flow — when to call
until_done_replan,until_done_task_update,until_done_complete,until_done_block.
TDD-first discipline (from pi-config)
/until-done enforces the
pi-config operating contract
end-to-end:
- Phases are explicit and tracked. Pi declares
phase: "analysis"|"bootstrap"|"red"|"green"|"refactor"|"none"viauntil_done_progressand the extension renders it live in the status line. - No GREEN without RED. The contract requires a failing test
before any production change for code-shipping goals. The
SKILL.mdloaded in-session enforces this; the system-prompt reminder repeats it every turn. - Done = verifyCommand passes.
until_done_completerequiresevidencethat quotes the command output. Speculative completion is refused. - Performance is a defect when there's a safe gain. REFACTOR encourages it.
- No claims about unverified state. The skill bans pretending tests, guarantees, or context exist when they have not been verified.
- Structural constraints. Nesting ≤ 3, construct ≤ 30 LOC, file ≤ 200 LOC, single responsibility per construct.
How /until-done differs from @qhn/pi-goal and Hermes /goal
@qhn/pi-goal |
Hermes /goal |
/until-done |
|
|---|---|---|---|
| Setup flow | User-led interview | None — judge asks each turn | Pi-led interview |
| Judge | None — model self-decides | Auxiliary model judge call | Pi self-judges via tools |
| State storage | Pi session entries | SessionDB.state_meta | Pi session entries |
| Hook coverage | 1–2 events | n/a (Hermes-internal) | All 29 events |
| Conflict-safe | yes | n/a | yes (auto-detects qhn/pi-goal) |
| System-prompt mutation | none | none | append-only |
If both @qhn/pi-goal and pi-until-done are installed, the user
sees a one-time notice at session_start and can pick whichever they
prefer per session. Tool/command/event keys are namespaced
until-done.* and until_done_* to avoid collisions with anything
else in the package ecosystem.
Edge cases the implementation handles
- Extension loaded mid-session → state reconstructs from existing custom entries; if none, no-op.
- Compaction during a goal → goal context appended to compaction
customInstructions; state re-anchored after. - Fork during a goal → user picks via
selectdialog. - Switch session during a goal → confirm dialog protects against accidental loss.
- Branch via
/tree→ state fully rebuilt from new branch (matches the todo.ts reference pattern). - User interjects mid-loop →
inputhook flagsuserMessagedThisTurn;agent_endskips continuation. - Model produces no tools/text →
progressSignalsThisTurn === 0triggersblockedwith spin-guard reason, prevents tight loop. - Turn budget exhausted → auto-pause with explicit
/until-done resumeinstructions (Hermes parity). - Pi calls
until_done_completefalsely → user can/until-done resumeto challenge it; new evidence required. - Goal already exists during setup →
selectdialog: replace / keep / cancel. - RPC / print mode (no UI) →
ctx.hasUIchecks degrade gracefully;setWidgetskipped,notifystill fires, custom overlay falls back to JSON dump. - Provider/model switch mid-goal → no judge re-binding required because the active model itself is the judge.
- Thinking level change mid-goal → tracked but doesn't affect state.
- Compaction over contract → contract is one of the first entries on the branch; reconstruction walks from root.
- Goal text with special chars → rendered through theme tokens, no shell expansion.
--until-doneflag at startup → triggers/until-done <text>viasendUserMessageexactly once at startup.@qhn/pi-goalalso installed → coexistence notice; commands do not collide because/goaland/until-doneare different names.- Ask-before timeout (no human at terminal) → 30s timeout on
confirm; on dismiss, the tool call is blocked withuser denied. - Hard ceiling 20000 turns →
cmdBudgetrejects values >20000; values >500 prompt a confirm dialog (the "go to lunch" threshold) so users opt into the spend / wall-clock cost explicitly. - Goal cancelled mid-streaming → state transitions to
cleared; nextagent_endshort-circuits viastate.status !== "active"guard. - Approval dialog times out →
confirmresolves to false; goal is cleared. - Multiple goals attempted →
until_done_setrejects withgoal_exists. - Tool called before approval →
until_done_setrejects withnot_confirmed. - Skill discovery race →
resources_discoverreturns relative-to-package paths; works regardless of install location. - Session shutdown → status + widget keys cleared; entries
persist on disk for the next
pi -c.
Verifying
Every CI/CD operation runs through mise. Once you've installed deps:
cd pi-until-done
mise install # installs bun + node per mise.toml
mise run install-deps # installs bun deps (idempotent)
mise run check # fast: typecheck + lint + format (parallel)
mise run ci # full: typecheck + lint + format + compile + test + build
mise run release-ready # release-readiness suite (parity check + surface presence)
Then in a Pi project:
pi -e ./extensions/until-done.ts
/until-done finish migrating auth tests to Vitest
Security
/until-done is an autonomous-loop extension. By default it runs Pi's
built-in tools (read, bash, edit, write, grep, find, ls)
on the active model's behalf each turn. You should know:
- Filesystem writes are unrestricted unless you list specific commands
in the contract's
askBefore[]. Ask-before triggers a confirm dialog for any matchingbashinvocation. Examples:git push,destructive sql,rm,terraform apply. - Network calls are whatever the model decides to make via
bash. The extension itself makes no network calls. - Credentials are never read or stored by the extension. Pi's session state is the only thing it persists, and it lives in your local Pi session entries (JSONL).
- CI commands run through
mise exec --against your project's mise config.mise tasks ls --jsonis the only direct invocation the extension makes for discovery; nothing else. - No system-prompt replacement — the extension only appends to the
system prompt via
before_agent_start, so other extensions' rules still apply. - No background side effects — no daemons, no hidden state, no uploads, no telemetry. State is auditable in the JSONL session log.
- Hard turn budget ceiling of 20000 with a confirm dialog above 500;
default is 20. Spin guard, clean-end nudge, CI failure, user input,
and
/until-done pauseall preempt regardless of budget.
For vulnerability disclosure see SECURITY.md.
Cross-platform CI
The GitHub workflow runs the full suite on macos-latest,
ubuntu-latest, and windows-latest in parallel via a matrix
strategy. The release-readiness job runs the same matrix on main and
on dispatch.
Tests live in tests/:
| Path | Covers |
|---|---|
tests/mise.test.ts |
routeThroughMise / isMiseCommand semantics |
tests/profiles/bun.test.ts |
TypeScript-bun profile shape |
tests/profiles/pnpm.test.ts |
NODE_PNPM profile shape |
tests/profiles/npm.test.ts |
NODE_NPM profile shape |
tests/profiles/yarn.test.ts |
NODE_YARN profile shape |
tests/profiles/deno.test.ts |
DENO profile shape |
tests/platform/os.test.ts |
macOS/Linux/Windows path + line-ending neutrality |
tests/platform/discovery.test.ts |
All profiles use POSIX-style markers; mise as sole entry point |
Run them with mise run test.
Contributing
This project is open source under MIT. Contributions welcome.
- Read AGENTS.md — the project's pi-config-derived contract
- Read CONTRIBUTING.md — dev setup, TDD flow, PR rules
- No AI co-authorship trailers in commits or PRs (project policy enshrined in CONTRIBUTING.md)
- See CODE_OF_CONDUCT.md for community standards
- For security issues: SECURITY.md (do not file public issues)
- Changelog: CHANGELOG.md
PR template at .github/PULL_REQUEST_TEMPLATE.md. Issue templates under .github/ISSUE_TEMPLATE/.
Sources
- Hermes Agent goals doc: https://hermes-agent.nousresearch.com/docs/user-guide/features/goals
- Hermes Agent goals source:
hermes_cli/goals.pyin https://github.com/nousresearch/hermes-agent - Pi extension API:
packages/coding-agent/src/core/extensions/types.tsin https://github.com/badlogic/pi-mono - Pi philosophy: https://github.com/srinitude/pi-config
- Pi extensions doc: https://pi.dev/docs/latest/extensions
License: MIT.
