pi-evo-research

Population-guided evolutionary research for pi — evolve hypotheses, run measured experiments, keep what works.

Packages

Package details

extensionskill

Install pi-evo-research from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-evo-research

Package: pi-evo-research
Version: 1.6.1
Published: May 30, 2026
Downloads: not available
Author: prinova
License: MIT
Types: extension, skill
Size: 298 KB
Dependencies: 0 dependencies · 4 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions"
  ],
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-evo-research

Population-guided evolutionary research for coding agents

Install · Usage · Evolutionary mode · Glossary

pi-evo-research helps a coding agent optimize software by running measured experiments, keeping improvements, discarding regressions, and maintaining a diverse population of candidate hypotheses.

It builds on David Cortés' pi-autoresearch, which adapts Andrej Karpathy's autoresearch idea for pi. pi-evo-research adds a more explicit search policy: avoid pure hill-climbing, track candidate families, mutate promising ideas, retire dead ends, and inject novelty when progress stalls.

The core principle:

Evolve hypotheses and patch strategies, not raw code strings.

What is different?

Classic single-path research often behaves like local search:

try idea → benchmark → keep/discard → try nearby idea

That works well until the agent gets stuck around a local optimum.

pi-evo-research encourages a population-based loop:

seed candidate families → evaluate → select → mutate → recombine safe winners → inject novelty

The benchmark remains the source of truth. Evolution guides which experiment to try next.

What's included

Part	Purpose
Pi extension	Tools, session state, dashboard, run logging, keep/discard automation
Evo research skill	Sets up the optimization session and drives the population-guided autonomous loop
Evolutionary mode	Agent policy for population-guided search over hypotheses
Hooks	Optional before/after scripts for external scheduling, notes, and candidate steering

Extension tools

Tool	Description
`init_experiment`	Configure session name, primary metric, unit, and direction
`run_experiment`	Run benchmark/check command, capture output, parse `METRIC name=value` lines
`log_experiment`	Record result, commit kept changes, revert rejected changes, persist ASI metadata

Command

Command	Description
`/evo-research <text>`	Enter the autonomous optimization loop or resume an existing one
`/evo-research off`	Leave evo-research mode while preserving logs
`/evo-research clear`	Delete `evo-research.jsonl` and reset runtime state
`/evo-research export`	Open live dashboard in a browser

The command name remains /evo-research for compatibility with the existing pi-evo-research workflow.

Install

pi install npm:pi-evo-research

Manual local install while developing:

cp -r extensions/pi-evo-research ~/.pi/agent/extensions/
cp -r skills/pi-evo-research-create ~/.pi/agent/skills/

Then run /reload in pi.

Usage

Start a session:

/skill:pi-evo-research-create

Or:

/evo-research optimize unit test runtime, keep correctness checks passing

The agent will ask or infer:

objective
benchmark command
primary metric and direction
secondary metrics
files in scope
constraints and off-limits areas

It writes session files, creates evo-research.population.json, runs a baseline, then loops:

inspect → propose candidate → edit → run_experiment → log_experiment → keep/discard → update population → repeat

Evolutionary mode

Evolutionary mode is a search policy for broad or noisy optimization tasks. It is not genetic programming and does not splice arbitrary code together.

Candidate representation

Each experiment should correspond to a candidate hypothesis or patch family:

{
  "candidate_id": "cand-cache-parser-v2",
  "family": "caching",
  "parent_id": "cand-cache-parser-v1",
  "operator": "mutation",
  "hypothesis": "Cache parser output by file content hash",
  "genome": {
    "strategy": "memoization",
    "scope": ["src/parser.ts"],
    "knobs": { "cache_key": "content_hash" }
  }
}

The agent logs this through log_experiment({ asi: ... }). The extension already persists ASI in evo-research.jsonl, so no new tool contract is required.

Operators

Operator	Use
`seed`	Introduce a new candidate family
`mutation`	Small variant of a promising candidate
`parameter_tune`	Adjust constants, thresholds, flags, or config
`specialization`	Add a narrower fast path
`simplification`	Preserve gain while reducing complexity
`recombination`	Combine independent kept ideas that touch compatible areas
`novelty`	Deliberately try a different family after stagnation

Selection rules

Primary metric decides fitness.
Checks must pass before a result can be kept.
Confidence score guards against noisy wins.
Simpler changes beat complex changes with similar fitness.
Do not spend more than a few consecutive runs in one failing family.
Keep diversity: retain at least one promising alternative family even while exploiting a winner.

What to avoid

Do not optimize sub-tasks in isolation unless global benchmark still improves.
Do not combine patches just because both were individually good; recombine only when interactions are understood.
Do not perform textual crossover over code.
Do not keep benchmark-only tricks that violate real constraints.

Session files

File	Purpose
`evo-research.md`	Session plan and durable context for future agents
`evo-research.sh`	Executable benchmark script that emits `METRIC name=value` lines
`evo-research.checks.sh`	Optional executable correctness/type/lint backpressure checks
`evo-research.ideas.md`	Candidate backlog and deferred hypotheses
`evo-research.jsonl`	Append-only experiment log, metrics, ASI, confidence, status
`evo-research.population.json`	Optional persistent population state for broad or long evolutionary runs
`evo-research.hooks/`	Optional before/after scripts for session automation

Generated shell scripts should be marked executable before the agent invokes them:

chmod +x evo-research.sh
# if present:
chmod +x evo-research.checks.sh

run_experiment also applies chmod +x to evo-research.sh and evo-research.checks.sh before invocation as a safety net.

evo-research.population.json is created and maintained by the extension. It is small, inspectable state for ranking candidates, tracking family failures, and triggering novelty after stagnation.

Minimal lifecycle:

/evo-research or init_experiment creates population state when needed.
log_experiment updates population state from the latest result and ASI.
Before the next iteration, evo-research prints a deterministic population steer message.
Benchmark results remain the source of truth; population state only guides which hypothesis to try next.

Core shape:

{
  "schema_version": 1,
  "generation": 0,
  "active_candidate_id": null,
  "stagnation_runs": 0,
  "candidates": [],
  "families": [],
  "scheduler": {
    "max_consecutive_family_failures": 3,
    "novelty_after_stagnation_runs": 5,
    "elite_limit": 3,
    "max_consecutive_family_attempts": 2,
    "explore_every_n_runs": 3,
    "generation_size": 10,
    "min_family_attempts_per_generation": 1
  }
}

ASI convention

Use log_experiment ASI to make evolutionary state durable:

{
  "candidate_id": "cand-batch-io-v3",
  "generation": 4,
  "family": "batching",
  "parent_id": "cand-batch-io-v2",
  "operator": "mutation",
  "hypothesis": "Batch file reads before parsing",
  "genome": {
    "strategy": "batching",
    "knobs": { "batch_size": 32 }
  },
  "outcome_learning": "Reduced syscalls but increased memory pressure",
  "next_mutation": "Try smaller batch size and reuse buffer"
}

This lets a resumed agent continue the search without relying on chat history.

Hooks

The extension handles population scheduling by default. Hooks remain available for custom automation or for users who want to replace/augment the default policy:

evo-research.hooks/after.sh: run retrospective automation after a logged result.
evo-research.hooks/before.sh: print additional steer messages before the next iteration.

Reference population hook examples still ship with the hooks skill as editable shell equivalents of the default policy. They require jq at runtime because hook payloads are JSON:

# Linux: install jq with your distro package manager, e.g. apt install jq
# macOS: brew install jq
mkdir -p evo-research.hooks
cp "<skill-dir>/examples/after/population-update.sh" evo-research.hooks/after.sh
cp "<skill-dir>/examples/before/population-scheduler.sh" evo-research.hooks/before.sh
chmod +x evo-research.hooks/after.sh evo-research.hooks/before.sh

Good hook behavior:

Next: mutate cand-fast-path-v2 by reducing allocations in tokenizer.
Avoid: regex family; 4 failed checked runs.
Inject novelty if next run fails.

Hooks are optional. The core loop works without them.

Dashboard and confidence

The dashboard shows run history, primary metric, secondary metrics, kept/discarded status, commits, and confidence.

Confidence is advisory. It estimates whether the best improvement is larger than observed noise. Low confidence should trigger confirmation reruns or candidate diversification, not automatic rejection.

Example domains

Domain	Primary metric	Candidate families
Test speed	seconds ↓	parallelism, fixture caching, selective setup, config tuning
Parser/runtime	µs ↓	fast paths, data structures, memoization, allocation reduction
Bundle size	KB ↓	tree-shaking, dependency removal, build config, code splitting
ML training	validation loss ↓	schedules, architecture knobs, data pipeline, regularization
Web perf	Lighthouse score ↑	caching, payload reduction, hydration strategy, image handling

Acknowledgements

pi-evo-research is derived from David Cortés' pi-autoresearch, his pi adaptation of Andrej Karpathy's autoresearch idea.

Positioning

Short version:

Population-guided evolutionary research for coding agents.

Longer version:

Built on David Cortés' pi-autoresearch adaptation of Karpathy's autoresearch idea, pi-evo-research explores a population of hypotheses instead of hill-climbing one idea at a time.

License

MIT