@ariesfish/pi-goal

Autonomous experiment loop for pi — run, measure, keep or discard. Inspired by karpathy/autoresearch.

Packages

Package details

extensionskill

Install @ariesfish/pi-goal from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@ariesfish/pi-goal
Package
@ariesfish/pi-goal
Version
0.4.0
Published
May 26, 2026
Downloads
399/mo · 58/wk
Author
ariesfish
License
MIT
Types
extension, skill
Size
289.1 KB
Dependencies
0 dependencies · 4 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions"
  ],
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-goal

Autonomous experiment loops for pi

Install · Usage · Reference

pi-goal lets pi try an idea, measure it, keep improvements, revert regressions, and continue from durable state.

Use it for any measurable optimization target: test speed, bundle size, training loss, build time, Lighthouse score, or custom benchmarks.


Install

pi install npm:@ariesfish/pi-goal

Manual install:

cp -r extensions/pi-goal ~/.pi/agent/extensions/
cp -r skills/goal-create ~/.pi/agent/skills/
cp -r skills/goal-finalize ~/.pi/agent/skills/
cp -r skills/goal-hooks ~/.pi/agent/skills/

Then run /reload in pi.


Usage

Start a loop

/skill:goal-create

The skill asks for, or infers:

  • goal
  • benchmark command
  • primary metric and direction
  • files in scope
  • constraints

It creates a goal branch, writes goal.md and goal.sh, runs the baseline, then starts iterating.

Run directly with /goal

/goal optimize unit test runtime, monitor correctness
/goal model training, run 5 minutes of train.py and track validation loss

Useful subcommands:

Command Purpose
/goal <text> Start or resume goal mode
/goal off Leave goal mode; keep persisted files
/goal clear Delete goal.jsonl and reset state
/goal reinit Start a new comparable experiment with a fresh baseline
/goal select <goal-id> Switch active research under .goal/researches/
/goal export Open the live browser dashboard

The loop

pi edits code, commits candidates, calls run_goal, then calls log_goal:

edit → commit → run_goal → log_goal → keep or revert → repeat

Results are appended to goal.jsonl. The current plan and learnings live in goal.md, so a fresh agent can resume after restarts or context compaction.

Finalize results

/skill:goal-finalize

This reads goal.jsonl, groups kept runs into logical changesets, asks for approval, then creates one reviewable branch per group from the merge base.


What it installs

Extension tools

Tool Purpose
init_goal Initialize the active research and first experiment
start_goal Open a new comparable experiment with a fresh baseline
run_goal Time a command, capture output, parse METRIC name=value lines
log_goal Record the run, keep improvements, revert failures/regressions
validate_goal Check goal files, metric output, checks, and workspace safety

Skills

Skill Purpose
goal-create Set up and start an optimization loop
goal-finalize Turn a noisy goal branch into clean review branches
goal-hooks Help author optional goal.hooks/before.sh and after.sh scripts

Files used by a loop

File Purpose
goal.md Goal, metric, scope, constraints, attempts, learnings
goal.sh Benchmark script; should output METRIC name=value
goal.jsonl Append-only run journal
goal.checks.sh Optional correctness checks after successful benchmarks
goal.ideas.md Optional backlog of promising ideas
goal.hooks/ Optional scripts fired before/after iterations

UI

  • Status widget above the editor: 🎯 goal 12 runs 8 kept │ ★ total_µs: 15,200 (-12.3%) │ conf: 2.1×
  • Ctrl+Shift+T: expand/collapse inline dashboard
  • Ctrl+Shift+F: fullscreen scrollable dashboard
  • /goal export: live browser dashboard with chart and share card

Override shortcuts in <agent-dir>/extensions/pi-goal.json:

{
  "shortcuts": {
    "toggleDashboard": "ctrl+shift+y",
    "fullscreenDashboard": null
  }
}

Use null to disable a shortcut.


Reference

Benchmark contract

goal.sh should exit non-zero on benchmark failure and print the primary metric as:

METRIC total_ms=123.4

Secondary metrics can use the same format:

METRIC bundle_kb=42.1

Backpressure checks

Create executable goal.checks.sh to block unsafe keeps:

#!/bin/bash
set -euo pipefail
pnpm test
pnpm typecheck

Checks run after a benchmark exits 0. Their runtime does not affect the primary metric. Failures are logged as checks_failed and code changes are reverted.

Confidence score

After 3+ runs in an experiment, pi-goal estimates benchmark noise with Median Absolute Deviation (MAD):

confidence = |best improvement| / MAD
Score Meaning
≥ 2.0× likely real improvement
1.0–2.0× above noise but marginal
< 1.0× within noise; rerun to confirm

The score is advisory. It never auto-discards.

Configuration

Create goal.config.json in the pi session directory:

{
  "workingDir": "/path/to/project",
  "maxIterations": 50
}
Field Purpose
workingDir Override where goal files, commands, and git operations run
maxIterations Stop after this many runs until a new experiment is started

Hooks

Optional executable hooks live in goal.hooks/:

Hook Fires Typical use
before.sh before activation and after each completed run fetch research, rotate ideas, prime context
after.sh after each log_goal append learnings, notify, tag winners

Hooks receive one JSON object on stdin, write steer text on stdout, timeout after 30s, and append hook entries to goal.jsonl. See skills/goal-hooks/examples/ for complete scripts.


Example targets

Target Metric Command
Test speed seconds ↓ pnpm test
Bundle size KB ↓ pnpm build && du -sb dist
Training val loss ↓ uv run train.py
Build speed seconds ↓ pnpm build
Lighthouse score ↑ lighthouse http://localhost:3000 --output=json

Prerequisites

  • pi installed and configured
  • an LLM provider API key
  • a benchmark command with a numeric metric

Goal loops can run for a long time. Use provider-side budgets and maxIterations to cap cost.

License

MIT