pi-captain

Captain — multi-agent pipeline orchestrator for pi. Define specialized agents, wire them into sequential/parallel/pool pipelines with quality gates, and run complex workflows.

Package details

extensionskill

Install pi-captain from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-captain
Package
pi-captain
Version
0.1.1
Published
Mar 13, 2026
Downloads
31/mo · 8/wk
Author
pierre-mike
License
MIT
Types
extension, skill
Size
1.1 MB
Dependencies
1 dependency · 6 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions/captain"
  ],
  "skills": [
    "./skills"
  ],
  "image": "https://raw.githubusercontent.com/Pierre-Mike/pi-captain/main/assets/preview.png"
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-captain

Lines Branches

⚠️ This is not production ready — just an experimentation.

Pipeline orchestrator for pi. Wire steps into sequential/parallel/pool pipelines with quality gates and run complex workflows — each step declares its own model, tools, and temperature inline.

Install

# Project-local (recommended — auto-installs for teammates)
pi install -l git:github.com/Pierre-Mike/pi-captain

# Global
pi install git:github.com/Pierre-Mike/pi-captain

Package Contents

Extensions

Path Description
extensions/captain/ Captain pipeline orchestrator — all tools & commands
extensions/native-web-search/ web_search tool — live internet search via Anthropic's web search beta
extensions/agent-loop/ /loop command — repeat agent turns by goal, count, or pipeline stages
extensions/clear/ /clear and /reset commands — wipe session and reload
extensions/terminal/ /terminal command — open a terminal split in your editor
extensions/zellij-tab-namer/ Auto-renames the Zellij tab after each agent turn
extensions/refactor-loop/ /refactor command — iterative analyze→refactor→verify cycles with refactor_pass tool
extensions/safety-destructive-commands/ Blocks/confirms dangerous bash commands (rm -rf, dd, fork bombs…)
extensions/safety-git-operations/ Confirms risky git ops (push --force, reset --hard, clean -f…)
extensions/safety-network-exfiltration/ Blocks curl-pipe-to-shell, secret leaks, sensitive file transfers
extensions/safety-path-protection/ Protects .git/, node_modules/, .env, SSH keys from writes
extensions/freecad/ freecad_* tools — drive FreeCAD to create/export 3D models

Skills

Path Description
skills/captain/ Captain skill — guides the LLM on pipeline authoring
skills/research-swarm/ Parallel 5-agent research with democratic scoring
extensions/refactor-loop/refactor-loop/ (bundled in extension) Refactor loop workflow instructions
skills/json-canvas/ JSON Canvas format for Obsidian .canvas files
skills/extension-generator/ Build and debug pi extensions
skills/skill-generator/ Generate new pi skills
extensions/freecad/skill/ FreeCAD agent wrapper and shell runner

Selective Install

Don't want everything? Use the object form in your settings.json to load only what you need.

Extension only (no skill):

{
  "packages": [
    {
      "source": "git:github.com/Pierre-Mike/pi-captain",
      "skills": []
    }
  ]
}

Skill only (no extension tools):

{
  "packages": [
    {
      "source": "git:github.com/Pierre-Mike/pi-captain",
      "extensions": []
    }
  ]
}

Or use pi config after installing to interactively toggle extensions and skills on/off.


What You Get

Tools

Tool Description
captain_load Load a builtin pipeline preset or .ts pipeline file
captain_run Execute a pipeline with input
captain_status Check pipeline progress, tokens, cost, and gate results
captain_list List all defined pipelines
captain_generate Generate a TypeScript pipeline file on-the-fly using LLM
captain_validate Validate a pipeline specification for structural correctness

Builtin Pipeline Presets

Preset Description
githubPrReview GitHub PR review pipeline
reqDecompose Requirements decomposition
reqDecomposeAi AI-powered requirements decomposition
requirementsGathering Requirements gathering workflow
researchSwarm Research swarm coordination
showcase Self-contained demo exercising all features
shredder Document shredding/analysis
specTdd Specification-driven TDD

Pipelines as TypeScript Files

The preferred way to write pipelines is as .ts files that export a pipeline const of type Runnable. Gates, OnFail handlers, and Transforms are plain functions — no JSON encoding needed.

// my-pipeline.ts
import { retry, skip, warn } from "<captain>/gates/on-fail.js";
import { bunTest, command, regexCI, user } from "<captain>/gates/presets.js";
import { llmFast } from "<captain>/gates/llm.js";
import { full, summarize } from "<captain>/transforms/presets.js";
import type { Runnable, Step } from "<captain>/types.js";

const research: Step = {
  kind: "step",
  label: "Research",
  model: "sonnet",
  tools: ["read", "bash"],
  prompt: "Research the following topic thoroughly:\n$ORIGINAL",
  gate: undefined,
  onFail: skip,
  transform: full,
};

const implement: Step = {
  kind: "step",
  label: "Implement",
  model: "sonnet",
  tools: ["read", "bash", "edit", "write"],
  prompt: "Based on this research:\n$INPUT\n\nImplement: $ORIGINAL",
  gate: bunTest,           // runs `bun test`, passes on exit 0
  onFail: retry(3),
  transform: full,
};

const review: Step = {
  kind: "step",
  label: "Review",
  model: "flash",
  tools: ["read", "bash"],
  temperature: 0.3,
  prompt: "Review this implementation:\n$INPUT\n\nOriginal: $ORIGINAL",
  gate: user,              // human approval in interactive UI
  onFail: skip,
  transform: summarize(),
};

export const pipeline: Runnable = {
  kind: "sequential",
  steps: [research, implement, review],
};

Load and run:

captain_load: action="load", name="./my-pipeline.ts"
captain_run: name="my-pipeline", input="Build a REST API for user management"

.pi/pipelines/ convention: User pipeline files placed in .pi/pipelines/ should start with two header comments so the name and description are discoverable without importing the module (this is required for captain_generate output):

// @name: my-pipeline-name
// @description: One-line description of what this pipeline does

These files are auto-discovered by captain_load (action: "list") and /captain-load.


Type Reference


Runnable (union)

A Runnable is anything that can be placed inside a pipeline. All four variants are infinitely nestable.

Runnable = Step | Sequential | Pool | Parallel

Step — atomic LLM invocation

Each step runs as an in-process pi SDK session. All config is declared inline on the step.

{
  kind: "step",                    // required — literal "step"
  label: string,                   // required — human-readable name shown in UI
  prompt: string,                  // required — instructions for the step
                                   //   $INPUT    → output of the previous step (or user input on step 1)
                                   //   $ORIGINAL → the original user request, always unchanged

  // ── Step config ───────────────────────────────────────────────────────
  model?: string,                  // optional — model identifier; default: current session model
                                   //   Examples: "sonnet", "flash", "claude-opus-4-5"
  tools?: string[],                // optional — tool names to enable
                                   //   Default: ["read","bash","edit","write"]
                                   //   Available: "read" | "bash" | "edit" | "write" | "grep" | "find" | "ls"
  temperature?: number,            // optional — sampling temperature (0–1)
  systemPrompt?: string,           // optional — system prompt for the LLM session
  skills?: string[],               // optional — absolute paths to .md skill files to inject
  extensions?: string[],           // optional — absolute paths to .ts extension files to load
  jsonOutput?: boolean,            // optional — if true, instructs step to return structured JSON; default: false

  // ── Step metadata ─────────────────────────────────────────────────────
  description?: string,            // optional — longer description (defaults to label)

  // ── Lifecycle ─────────────────────────────────────────────────────────
  gate?: Gate,                     // optional — validation after this step runs
  onFail?: OnFail,                 // optional — what to do if gate fails or step errors
  transform: Transform,            // required — how to pass output to the next step
}

Example step (TypeScript):

import { bunTest } from "<captain>/gates/presets.js";
import { retry } from "<captain>/gates/on-fail.js";
import { full } from "<captain>/transforms/presets.js";

const buildStep: Step = {
  kind: "step",
  label: "Build & Test",
  model: "sonnet",
  tools: ["read", "bash", "edit", "write"],
  prompt: "Implement $ORIGINAL. Make all tests pass.",
  gate: bunTest,
  onFail: retry(3),
  transform: full,
};

Gate — plain validation function

A gate is a plain function that receives the step output and optional side-effect context.
Return true to pass, or a string describing why it failed. Throwing is also treated as a failure.

type Gate = (params: {
  output: string;
  ctx?: GateCtx;
}) => true | string | Promise<true | string>;

Inline gates:

// Simple content check
gate: ({ output }) => output.includes("DONE") ? true : 'Output must contain "DONE"'

// JSON validity check
gate: ({ output }) => {
  try { JSON.parse(output.trim()); return true; }
  catch { return "Output is not valid JSON"; }
}

// Shell command via ctx
gate: async ({ ctx }) => {
  const { code, stderr } = await ctx!.exec("bash", ["-c", "bun test"]);
  return code === 0 ? true : `Tests failed: ${stderr.slice(0, 200)}`;
}

// Stateful gate using closure
let attempts = 0;
gate: ({ output }) => {
  attempts++;
  return attempts >= 3 ? true : `Need 3 attempts, got ${attempts}`;
}

Gate presets (import from gates/presets.js):

Export Description
command(cmd) Run shell command - exit 0 passes, non-zero fails
file(path) File must exist
regexCI(pattern) Output must match regex (case-insensitive)
allOf(...gates) All gates must pass
user Require human confirmation via interactive UI
bunTest Preset: run bun test

LLM gate (import from gates/llm.js):

Export Description
llmFast(prompt, threshold?) LLM evaluates output quality (0–1 threshold, default 0.7)
import { llmFast } from "<captain>/gates/llm.js";
gate: llmFast("Is this implementation production-ready?", 0.8)

OnFail — plain failure-handling function

An OnFail is a plain function that receives failure context and returns what to do next.

type OnFail = (ctx: OnFailCtx) => OnFailResult | Promise<OnFailResult>;

interface OnFailCtx {
  reason: string;      // Gate failure reason
  retryCount: number;  // Retries already attempted (0 on first failure)
  stepCount: number;   // Total times step has run (retryCount + 1)
  output: string;      // Last output before failure
}

type OnFailResult =
  | { action: "retry" }
  | { action: "fail" }
  | { action: "skip" }
  | { action: "warn" }
  | { action: "fallback"; step: Step };

OnFail presets (import from gates/on-fail.js):

Export Description
retry(max?) Re-run up to N times (default 3), then fail
retryWithDelay(max, delayMs) Retry with pause between attempts
fallback(step) Run an alternative step instead
skip Skip scope - mark as skipped, continue with empty output
warn Log warning but treat as passed and continue

Custom inline:

// Retry twice, then warn
onFail: ({ retryCount }) => retryCount < 2 ? { action: "retry" } : { action: "warn" }

When to use warn vs skip:

  • warn: Gate failed but output is still useful — pass it through. Good for advisory gates.
  • skip: Gate failed and output is unreliable — discard it. Good for mandatory validation.

Transform — plain output-shaping function

A transform is a plain function that maps one step's output to the next step's input.

type Transform = (params: {
  output: string;    // Raw output produced by the step
  original: string;  // The very first pipeline input ($ORIGINAL)
  ctx: TransformCtx; // Side-effect helpers (shell, LLM, …)
}) => string | Promise<string>;

Transform presets (import from transforms/presets.js):

Export Description
full Pass entire output unchanged (default)
extract(key) Parse JSON and extract a top-level key
summarize() Ask LLM to summarize in 2–3 sentences

Inline transforms:

// Trim whitespace
transform: ({ output }) => output.trim()

// Pull JSON key with fallback
transform: ({ output }) => {
  try { return JSON.parse(output).result; }
  catch { return output; }
}

// Shell post-processing
transform: async ({ output, ctx }) => {
  const { stdout } = await ctx.exec("jq", ["-r", ".items[]"]);
  return stdout || output;
}

Sequential — chain steps via $INPUT

{
  kind: "sequential",
  steps: Runnable[],      // ordered list of steps/sub-pipelines
  gate?: Gate,            // validates final output of the sequence
  onFail?: OnFail,        // retry = re-run entire sequence from scratch
  transform?: Transform,  // applied to final output after gate passes
}

Parallel — different steps concurrently

{
  kind: "parallel",
  steps: Runnable[],                 // each runs concurrently (own git worktree)
  merge: MergeFn,                    // how to combine branch outputs
  gate?: Gate,
  onFail?: OnFail,
  transform?: Transform,
}

Pool — same step × N

{
  kind: "pool",
  step: Runnable,                    // replicated N times
  count: number,
  merge: MergeFn,                    // how to combine branch outputs
  gate?: Gate,
  onFail?: OnFail,
  transform?: Transform,
}

MergeFn — combining parallel/pool outputs

MergeFn is a plain function: (outputs: string[], ctx: MergeCtx) => string | Promise<string>.

Import named presets from merge.js:

import { concat, awaitAll, firstPass, vote, rank } from "<captain>/merge.js";
Preset Behaviour
concat Concatenate all outputs in order
awaitAll Wait for all, return concatenated (alias for concat)
firstPass Return the first non-empty output
vote LLM picks the single best output
rank LLM ranks all outputs and synthesizes the top one

You can also write inline merge functions:

merge: (outputs) => outputs.join("\n---\n")

Complete Pipeline Example (TypeScript)

// research-and-build.ts
import { retry, skip, warn } from "<captain>/gates/on-fail.js";
import { bunTest, allOf, file, user } from "<captain>/gates/presets.js";
import { llmFast } from "<captain>/gates/llm.js";
import { concat } from "<captain>/merge.js";
import { full, summarize } from "<captain>/transforms/presets.js";
import type { Runnable } from "<captain>/types.js";

export const pipeline: Runnable = {
  kind: "sequential",
  steps: [
    {
      kind: "step",
      label: "Explore codebase",
      model: "flash",
      tools: ["read", "bash"],
      prompt: "Explore the codebase and understand how to implement: $ORIGINAL. Identify relevant files, patterns, and constraints.",
      onFail: skip,
      transform: full,
    },
    {
      kind: "parallel",
      steps: [
        {
          kind: "step",
          label: "Write tests",
          model: "sonnet",
          tools: ["read", "bash", "edit", "write"],
          temperature: 0.2,
          prompt: "Based on this analysis:\n$INPUT\n\nWrite failing tests for: $ORIGINAL",
          gate: async ({ ctx }) => {
            const { code } = await ctx!.exec("bash", ["-c", "bun test 2>&1 | grep -q fail"]);
            return code === 0 ? true : "Tests should fail (red phase)";
          },
          onFail: retry(2),
          transform: full,
        },
        {
          kind: "step",
          label: "Write docs",
          model: "sonnet",
          tools: ["read", "bash", "edit", "write"],
          prompt: "Based on this analysis:\n$INPUT\n\nDraft documentation for: $ORIGINAL",
          onFail: warn,
          transform: full,
        },
      ],
      merge: concat,
    },
    {
      kind: "step",
      label: "Implement",
      model: "sonnet",
      tools: ["read", "bash", "edit", "write"],
      temperature: 0.2,
      prompt: "Context:\n$INPUT\n\nImplement: $ORIGINAL\nMake all tests pass.",
      gate: bunTest,
      onFail: retry(3),
      transform: full,
    },
    {
      kind: "step",
      label: "Review",
      model: "flash",
      tools: ["read", "bash"],
      temperature: 0.3,
      prompt: "Review the implementation for $ORIGINAL. Focus on correctness, security, and maintainability.",
      gate: llmFast("Does the review indicate the implementation is ready for production?", 0.8),
      onFail: retry(1),
      transform: summarize(),
    },
  ],
};

Quick Start

# Load and run a builtin preset
> Use captain to review my PR

# Load a custom TypeScript pipeline
> captain_load: name="./my-pipeline.ts"
> captain_run: name="my-pipeline", input="refactor the auth module"

# Single-step ad-hoc via /captain-step
> /captain-step "analyze this codebase" --model flash --tools read,bash

Development

git clone https://github.com/Pierre-Mike/pi-captain.git
cd pi-captain
npm install

Scripts

Script Description
npm run check Lint & format check
npm run fix Auto-fix lint & format issues
npm test Run all tests

License

MIT