@ariesfish/pi-goal

Autonomous experiment loop for pi — run, measure, keep or discard. Inspired by karpathy/autoresearch.

Packages

Package details

extensionskill

Install @ariesfish/pi-goal from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@ariesfish/pi-goal

Package: @ariesfish/pi-goal
Version: 0.4.0
Published: May 26, 2026
Downloads: 399/mo · 58/wk
Author: ariesfish
License: MIT
Types: extension, skill
Size: 289.1 KB
Dependencies: 0 dependencies · 4 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions"
  ],
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-goal

Autonomous experiment loops for pi

Install · Usage · Reference

pi-goal lets pi try an idea, measure it, keep improvements, revert regressions, and continue from durable state.

Use it for any measurable optimization target: test speed, bundle size, training loss, build time, Lighthouse score, or custom benchmarks.

Install

pi install npm:@ariesfish/pi-goal

Manual install:

cp -r extensions/pi-goal ~/.pi/agent/extensions/
cp -r skills/goal-create ~/.pi/agent/skills/
cp -r skills/goal-finalize ~/.pi/agent/skills/
cp -r skills/goal-hooks ~/.pi/agent/skills/

Then run /reload in pi.

Usage

Start a loop

/skill:goal-create

The skill asks for, or infers:

goal
benchmark command
primary metric and direction
files in scope
constraints

It creates a goal branch, writes goal.md and goal.sh, runs the baseline, then starts iterating.

Run directly with `/goal`

/goal optimize unit test runtime, monitor correctness
/goal model training, run 5 minutes of train.py and track validation loss

Useful subcommands:

Command	Purpose
`/goal <text>`	Start or resume goal mode
`/goal off`	Leave goal mode; keep persisted files
`/goal clear`	Delete `goal.jsonl` and reset state
`/goal reinit`	Start a new comparable experiment with a fresh baseline
`/goal select <goal-id>`	Switch active research under `.goal/researches/`
`/goal export`	Open the live browser dashboard

The loop

pi edits code, commits candidates, calls run_goal, then calls log_goal:

edit → commit → run_goal → log_goal → keep or revert → repeat

Results are appended to goal.jsonl. The current plan and learnings live in goal.md, so a fresh agent can resume after restarts or context compaction.

Finalize results

/skill:goal-finalize

This reads goal.jsonl, groups kept runs into logical changesets, asks for approval, then creates one reviewable branch per group from the merge base.

What it installs

Extension tools

Tool	Purpose
`init_goal`	Initialize the active research and first experiment
`start_goal`	Open a new comparable experiment with a fresh baseline
`run_goal`	Time a command, capture output, parse `METRIC name=value` lines
`log_goal`	Record the run, keep improvements, revert failures/regressions
`validate_goal`	Check goal files, metric output, checks, and workspace safety

Skills

Skill	Purpose
`goal-create`	Set up and start an optimization loop
`goal-finalize`	Turn a noisy goal branch into clean review branches
`goal-hooks`	Help author optional `goal.hooks/before.sh` and `after.sh` scripts

Files used by a loop

File	Purpose
`goal.md`	Goal, metric, scope, constraints, attempts, learnings
`goal.sh`	Benchmark script; should output `METRIC name=value`
`goal.jsonl`	Append-only run journal
`goal.checks.sh`	Optional correctness checks after successful benchmarks
`goal.ideas.md`	Optional backlog of promising ideas
`goal.hooks/`	Optional scripts fired before/after iterations

UI

Status widget above the editor: 🎯 goal 12 runs 8 kept │ ★ total_µs: 15,200 (-12.3%) │ conf: 2.1×
Ctrl+Shift+T: expand/collapse inline dashboard
Ctrl+Shift+F: fullscreen scrollable dashboard
/goal export: live browser dashboard with chart and share card

Override shortcuts in <agent-dir>/extensions/pi-goal.json:

{
  "shortcuts": {
    "toggleDashboard": "ctrl+shift+y",
    "fullscreenDashboard": null
  }
}

Use null to disable a shortcut.

Reference

Benchmark contract

goal.sh should exit non-zero on benchmark failure and print the primary metric as:

METRIC total_ms=123.4

Secondary metrics can use the same format:

METRIC bundle_kb=42.1

Backpressure checks

Create executable goal.checks.sh to block unsafe keeps:

#!/bin/bash
set -euo pipefail
pnpm test
pnpm typecheck

Checks run after a benchmark exits 0. Their runtime does not affect the primary metric. Failures are logged as checks_failed and code changes are reverted.

Confidence score

After 3+ runs in an experiment, pi-goal estimates benchmark noise with Median Absolute Deviation (MAD):

confidence = |best improvement| / MAD

Score	Meaning
`≥ 2.0×`	likely real improvement
`1.0–2.0×`	above noise but marginal
`< 1.0×`	within noise; rerun to confirm

The score is advisory. It never auto-discards.

Configuration

Create goal.config.json in the pi session directory:

{
  "workingDir": "/path/to/project",
  "maxIterations": 50
}

Field	Purpose
`workingDir`	Override where goal files, commands, and git operations run
`maxIterations`	Stop after this many runs until a new experiment is started

Hooks

Optional executable hooks live in goal.hooks/:

Hook	Fires	Typical use
`before.sh`	before activation and after each completed run	fetch research, rotate ideas, prime context
`after.sh`	after each `log_goal`	append learnings, notify, tag winners

Hooks receive one JSON object on stdin, write steer text on stdout, timeout after 30s, and append hook entries to goal.jsonl. See skills/goal-hooks/examples/ for complete scripts.

Example targets

Target	Metric	Command
Test speed	seconds ↓	`pnpm test`
Bundle size	KB ↓	`pnpm build && du -sb dist`
Training	val loss ↓	`uv run train.py`
Build speed	seconds ↓	`pnpm build`
Lighthouse	score ↑	`lighthouse http://localhost:3000 --output=json`

Prerequisites

pi installed and configured
an LLM provider API key
a benchmark command with a numeric metric

Goal loops can run for a long time. Use provider-side budgets and maxIterations to cap cost.

License

MIT