pi-flows

Delegate pi work to isolated, budgeted sub-agents with verification loops and tracing.

Packages

Package details

extension

Install pi-flows from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-flows

Package: pi-flows
Version: 0.1.1
Published: Jun 11, 2026
Downloads: not available
Author: thulr
License: MIT
Types: extension
Size: 269.4 KB
Dependencies: 0 dependencies · 5 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions/pi-flows/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-flows

Use pi for the work you want to keep out of your main session: repo scouting, parallel investigation, implementation plus review, and large-task decomposition.

pi-flows adds a flow tool that runs separate, disposable pi subprocesses and returns compact findings to the parent session. Instead of asking one long-running chat to explore, edit, review, remember every file it opened, and stay within budget, you can send bounded work to specialized child agents and keep the main thread focused on the decision.

When it helps you

Use pi-flows when the next step would otherwise make your main pi session noisy, expensive, or hard to trust:

Your situation	What you ask pi	What pi-flows gives back
You need to understand a code path before touching it.	"Have a read-only agent find the billing routes."	A compact, cited recon report from an agent that cannot mutate the repo or run shell commands.
You have several independent areas to inspect.	"Check frontend auth and backend auth in parallel."	Separate child runs with capped fan-out instead of one context stuffed with every file.
You want an implementation checked before you accept it.	"Add `/health` with a test, and don't call it done until `npm test` passes."	A bounded generator-evaluator loop where a builder, critic, and optional command gate must pass.
You have a broad research task.	"Document how auth works across login, refresh, and sessions."	Decompose, fan out, synthesize, and optionally verify the merged answer.
You care what the delegation cost.	"Run this with a $0.25 cap and save a trace."	Cumulative cost/token ceilings plus OpenInference-shaped JSONL traces and `/flows report`.

Why this instead of another sub-agent extension

pi-flows is a small harness, not just a folder of specialist prompts. The distinction matters when you want delegation to be repeatable and auditable.

Native isolation over prompt promises. recon and analyst run with read-only tools and no shell, so exploration cannot accidentally edit files. Concurrent write-capable agents cannot share one checkout unless you explicitly opt in.
Verification is a first-class mode. evaluate runs builder and critic in separate child contexts, can require npm test or another checkCommand, and revises under a hard iteration cap. This is stronger than asking one agent to "double-check itself."
Multiple proven patterns share one contract. single, parallel, chain, evaluate, vote, route, and orchestrate are all exposed through the same flow tool, so you can start with a scout and only add coordination when the task needs it. See Patterns.
Delegation is bounded. Count, concurrency, timeout, nesting depth, total tokens, and total USD spend are capped by the harness. A runaway fan-out returns BUDGET_EXCEEDED instead of quietly burning through the rest of the task.
Handoffs are treated as an attack surface. Content passed from one child to another is capped, redacted, stripped of invisible/bidi characters, and scanned for instruction-override markers before reuse.
You can inspect what happened. Structured errors include cause and fix fields, traces are plain JSONL, and /flows report summarizes success rate, cost, token use, budget hits, route choices, and voting warnings.
It stays inside pi. You install it as a pi package, use your existing pi provider setup, and talk to pi in plain English. The JSON in these docs is the contract behind the scenes, not something you must write for normal use.

You probably do not need pi-flows if you only want a single custom prompt, a long-lived autonomous swarm, or peer-to-peer agents that talk to each other. pi-flows deliberately uses a star topology: parent delegates bounded work, children return compact results, parent decides.

What it looks like

You talk to pi in plain English — it reads the flow tool and writes the call for you. Load the extension, then just ask:

Have a read-only agent find the API routes for billing.

pi delegates that to recon, which runs in its own subprocess and hands back just the findings. You never hand-write JSON — pi fills in the agent and the mode. (The call here is {"agent":"recon","task":"Find the API routes for billing"}; these docs show the JSON as the exact contract, for when you want to verify it or take manual control.)

Ask for a verified result and pi reaches for a stronger mode on its own:

Add a /health endpoint that returns 200 and a JSON status, with a test — and don't call it done until `npm test` passes.

pi runs this as an evaluate loop — the operator builds the change, a separate redteam critic judges the result, and npm test must exit 0, revising until both pass or it hits maxIterations. The call behind it:

{
  "task": "Add a /health endpoint that returns 200 and a JSON status, with a test",
  "evaluate": { "checkCommand": "npm test", "maxIterations": 3 }
}

→ Quickstart

Install

pi-flows runs inside the pi coding agent, so you install it as a pi package — no clone required.

Prerequisites: Node.js >=24, npm >=11, and the pi CLI >=0.78.0 on your PATH. Don't have pi? It ships in @earendil-works/pi-coding-agent:

npm i -g @earendil-works/pi-coding-agent

Install it with the pi CLI — from npm for the published release, or from GitHub to track main:

# From npm (recommended) — the published release
pi install npm:pi-flows

# Add -l to install into the current project only (.pi/settings.json)
pi install -l npm:pi-flows

# Or track the latest main straight from GitHub, no clone required
pi install git:github.com/Thulr/pi-flows

Reload pi with /reload (or restart it), then verify — /flows version is a command, and the second line is plain English that pi turns into a flow call:

/flows version
list the available flow agents

Success looks like all nine bundled agents in the flow list output — recon, strategist, overwatch, operator, analyst, redteam, controller, commander, and debrief. If pi isn't found, see Troubleshooting → pi: command not found. → Quickstart

Run from a clone (development)

To hack on pi-flows or try unreleased main, work from a checkout:

git clone https://github.com/Thulr/pi-flows
cd pi-flows
npm ci
npm run preflight   # verify the pi CLI is installed and on PATH
pi -e ./extensions/pi-flows/index.ts   # load the local extension in pi

Inside pi, smoke-test with no model call:

/flows help
/flows status
Use flow with {"list":true}
Use flow with {"showConfig":true}

Or install your working copy as a package with pi install -l ./. See Development for the build/test loop and Contributing.

What it adds

flow tool: runs isolated pi subprocesses for single, parallel, chain, evaluate (generator-evaluator), vote, route, orchestrate, graph, loop, and search delegation.
/flows command: lists available flow agents and shows help/status/version output.
Bundled agents in agents/: recon, strategist, overwatch, operator, analyst, redteam, controller, commander, and debrief.
Your own agents, no code required — one markdown file (frontmatter + system prompt) per agent. User agents live in ~/.pi/agent/flow-agents/*.md; project agents in .pi/flow-agents/*.md (loaded with agentScope: "project" or "all", and trust-gated). Project shadows user shadows bundled, with a visible diagnostic. See Custom agents.

Safety model

Project-local agents are repo-controlled prompts. In interactive pi sessions, pi-flows asks before running them. In headless (non-UI) runs, pi-flows fails closed by default and refuses project-local agents unless you explicitly pass confirmProjectAgents:false after reviewing the files.

pi-flows also redacts secret-shaped content and home paths from returned content/details by default. Inter-agent handoffs — where one child's output becomes another child's prompt ({previous} in chain, the evaluate artifact, vote ballots, orchestrate findings) — are an indirect prompt-injection surface, so pi-flows strips invisible/bidi characters and flags instruction-override markers in that content before reuse, surfacing a warning rather than silently trusting it. See Privacy & telemetry.

Cost is bounded as well as count and time: pass maxCostUsd / maxTokens to cap cumulative spend across the whole flow tree (BUDGET_EXCEEDED once reached). Concurrent fan-out also refuses multiple write-capable agents in the same cwd (SHARED_WRITE_CWD) unless allowSharedWriteCwd:true is explicit. Read-only agents (recon, analyst) ship without a shell, so their read-only boundary is enforced by the toolset, not by prompt instructions alone.

`flow` tool quick reference

You don't type these objects — you describe what you want and pi builds the call. This is the exact contract behind those requests: skim it to see what pi will run, or to take manual control (pin a specific agent, model, or budget). Each block is the JSON pi passes to the flow tool.

List

{ "list": true }

Show effective config

{ "showConfig": true }

Single

{ "agent": "recon", "task": "Find the API routes for billing" }

Parallel

{
  "tasks": [
    { "agent": "recon", "task": "Find frontend auth code" },
    { "agent": "recon", "task": "Find backend auth code" }
  ],
  "concurrency": 2
}

Defaults: concurrency=4 (per-call). maxParallelTasks is a fixed hard cap of 8, not a per-call input.

Chain

{
  "task": "Add Redis caching to the session store",
  "chain": [
    { "agent": "recon", "task": "Research this task: {task}" },
    { "agent": "strategist", "task": "Plan using this context:\n\n{previous}" }
  ]
}

Chain {previous} handoffs are capped, redacted, and scanned for injection before they become the next prompt.

Evaluate (generator-evaluator loop)

{
  "task": "Add a /health endpoint that returns 200 and a JSON status, with a test",
  "evaluate": {
    "operator": { "agent": "operator" },
    "redteam": { "agent": "redteam" },
    "checkCommand": "npm test",
    "maxIterations": 3
  }
}

The operator builds against task; a separate redteam judges the artifact (not the builder's trace) and returns VERDICT: PASS or VERDICT: REVISE with critique. On REVISE the operator is re-shown its prior artifact plus the critique and revises in place. The loop revises until it passes or hits maxIterations (default 3, cap 8).

Two optional reliability levers: checkCommand is a deterministic gate (a shell command that must exit 0 — level-1 code assertions alongside the LLM critic; non-zero is an automatic REVISE), and redteam may be an array of critics (a decomposed panel — e.g. one per dimension; PASS requires all of them). See Flow reference.

Vote (parallelization / voting)

{
  "task": "Is /^(a+)+$/ vulnerable to catastrophic backtracking?",
  "vote": { "voters": [{ "agent": "recon" }, { "agent": "overwatch" }], "debrief": { "agent": "debrief" } }
}

Runs the same task across ≥2 voters (use different models to break correlated errors) and synthesizes one answer via the optional debrief aggregator. Without it, all answers are returned.

Route (classify → dispatch)

{ "task": "The billing webhook returns 500s in prod", "route": { "candidates": ["recon", "strategist", "overwatch"], "fallback": "recon" } }

The controller picks one candidate (ROUTE: <agent>) and runs it — or emits ROUTE: none when nothing fits, falling back instead of forcing a guess.

Orchestrate (decompose → fan out → synthesize)

{
  "task": "Document how auth works across the codebase",
  "returnContract": "Return sections for login, token refresh, session storage, and gaps.",
  "requireEvidence": true,
  "orchestrate": {
    "recon": { "agent": "recon" },
    "verify": { "agent": "overwatch" },
    "verifyPolicy": "revise",
    "maxSubtasks": 5
  }
}

The commander splits the task into a JSON list of subtasks, recon workers run them in parallel, and the debrief agent merges the findings. An optional verify critic checks the merged answer against the goal in the same call. verifyPolicy:"note" keeps the verdict advisory, "fail" hard-fails on REVISE, and "revise" reruns debrief with the critique until pass or verifyMaxIterations.

Graph (static DAG)

{
  "task": "Map auth",
  "graph": {
    "nodes": [
      { "id": "frontend", "agent": "recon", "task": "Find frontend auth for {task}" },
      { "id": "backend", "agent": "recon", "task": "Find backend auth for {task}" },
      { "id": "summary", "agent": "strategist", "dependsOn": ["frontend", "backend"], "task": "Plan from:\n{node.frontend}\n{node.backend}" }
    ],
    "debrief": { "agent": "debrief" }
  }
}

Ready nodes run by dependency wave, with the same caps, redaction, trace, and write-collision guards as other modes.

Loop (bounded repeat-until-done)

{
  "task": "Draft release notes",
  "loop": { "body": { "agent": "operator" }, "judge": { "agent": "redteam" }, "maxIterations": 3 }
}

The body repeats until it emits LOOP: DONE, or the optional judge emits VERDICT: PASS.

Search (bounded beam search)

{
  "task": "Pick a cache strategy",
  "search": { "generator": { "agent": "strategist" }, "scorer": { "agent": "redteam", "tools": "none" }, "debrief": { "agent": "debrief" }, "candidates": 3, "beamWidth": 1, "maxRounds": 2 }
}

search generates candidate paths, scores each with SCORE: 0..100, keeps the best beam, and debriefs the winner. The default scorer is redteam with tools disabled so parallel scoring stays read-only.

Cost budget and tracing

Any mode accepts a cumulative spend ceiling and a trace sink:

{ "task": "...", "orchestrate": {}, "maxCostUsd": 0.50, "traceFile": "flow-trace.jsonl", "traceLabel": "release-gate" }

maxCostUsd / maxTokens cap total spend across the whole flow tree (BUDGET_EXCEEDED once reached). traceFile (or PI_FLOWS_TRACE_FILE) appends one OpenInference-shaped JSON span per child plus a root span — JSONL any OpenTelemetry backend, or a coding agent, can read. Summarize local traces with /flows report flow-trace.jsonl or npm run trace:report -- flow-trace.jsonl from a checkout.

Human checkpoints and Reflexion

{ "task": "...", "evaluate": {}, "checkpoint": { "before": "spawn" } }
{ "task": "...", "loop": { "body": { "agent": "operator" } }, "reflexion": { "enabled": true } }

checkpoint.before:"spawn" asks for approval before any child runs; "finalize" asks before returning the final result. Headless runs fail closed. reflexion.enabled:true opts into local cross-run lessons in .pi/flow-reflections.jsonl.

Agent definition format

Create markdown files with YAML frontmatter:

---
name: my-agent
description: What this agent does
tools: read,grep,find,ls
tier: capable
---

System prompt for the delegated agent.

tier keeps agents portable — no vendor model is hard-coded. capable runs on your pi default model; fast runs on PI_FLOWS_FAST_MODEL if you set one (e.g. a cheaper model for your provider, like openai-codex/gpt-5.4-mini), otherwise your default too. So flows use whatever model you have pi set up with, and the extension never needs updating as providers ship new models. Pin an explicit model: to override the tier (a flow-call model overrides too). tools: none disables built-in tools. Omitting tools uses pi defaults. Invalid agent files are reported in /flows status and flow showConfig:true.

Documentation ladder

Development

npm ci
npm run check

Useful individual checks:

npm run typecheck
npm test
npm run validate:agents
npm run pack:dry-run

pi-flows

When it helps you

Why this instead of another sub-agent extension

What it looks like

Install

Run from a clone (development)

What it adds

Safety model

flow tool quick reference

List

Show effective config

Single

Parallel

Chain

Evaluate (generator-evaluator loop)

Vote (parallelization / voting)

Route (classify → dispatch)

Orchestrate (decompose → fan out → synthesize)

Graph (static DAG)

Loop (bounded repeat-until-done)

Search (bounded beam search)

Cost budget and tracing

Human checkpoints and Reflexion

Agent definition format

Documentation ladder

Development

`flow` tool quick reference