pk-pi-hermes-evolve

Pi package inspired by Hermes Agent Self-Evolution for reflective improvement of skills, prompts, and instruction files.

Package details

extension

Install pk-pi-hermes-evolve from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pk-pi-hermes-evolve
Package
pk-pi-hermes-evolve
Version
0.2.1
Published
Apr 12, 2026
Downloads
415/mo · 17/wk
Author
pk-nerdsaver-ai
License
MIT
Types
extension
Size
104 KB
Dependencies
0 dependencies · 4 peers
Pi manifest JSON
{
  "extensions": [
    "./src/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pk-pi-hermes-evolve

A local pi package inspired by Nous Research's Hermes Agent Self-Evolution.

This package adapts the Hermes Phase 1 idea to pi:

  • pick a local instruction artifact (SKILL.md, prompt template, AGENTS.md, SYSTEM.md, etc.)
  • generate a compact evaluation set from synthetic tasks, recent pi session history, or both
  • run a reflective candidate-generation loop
  • proxy-score baseline vs candidates with an LLM judge
  • save a reviewable report and candidate files under .pi/hermes-self-evolution/
  • never overwrite the original target automatically

It is a pi-native extension with a hybrid backend model:

  • TypeScript backend: always available, uses pi subprocess calls as a local proxy-evolution loop
  • Python backend: optional, uses a real DSPy/GEPA-style path when Python + DSPy are installed

The core loop is modeled after Hermes' mutation → evaluation → guardrails → human review flow, but adapted to pi extension APIs and local pi session history.

Pi docs reviewed for this package

This package was designed against pi's extension/package docs and examples, especially:

  • README.md
  • docs/extensions.md
  • docs/packages.md
  • docs/session.md
  • docs/tui.md
  • examples:
    • examples/extensions/subagent/
    • examples/extensions/plan-mode/
    • examples/extensions/todo.ts
    • examples/extensions/with-deps/

Key pi takeaways applied here:

  • ship as a pi package with a pi.extensions manifest
  • keep extension logic in TypeScript loaded directly by pi
  • use a command for human-driven runs and a tool for model-driven runs
  • keep state in session entries with appendEntry() instead of hidden external mutation
  • use .pi/... paths for project-local generated artifacts
  • rely on session JSONL history as a local source for evolution context

What it supports

Target artifacts

Best fit:

  • .pi/skills/**/SKILL.md
  • .pi/prompts/*.md
  • .agents/skills/**/SKILL.md
  • AGENTS.md
  • .pi/SYSTEM.md
  • .pi/APPEND_SYSTEM.md

The engine is optimized for text instructions, not general code evolution.

Commands

  • /evolve → interactive artifact picker
  • /evolve path/to/file.md → evolve a specific file
  • /evolve last → show the last saved report path in the current session

Tool

  • self_evolve_artifact

Use it when you explicitly want the model to improve a local instruction artifact and save reviewable candidates.

Backends

  • auto → prefer Python DSPy backend when available, otherwise TypeScript fallback
  • python → require the Python backend
  • typescript → force the TypeScript-only path

Install

Local path install

From pi:

pi install npm:pk-pi-hermes-evolve

Or project-local:

pi install -l npm:pk-pi-hermes-evolve

Direct extension loading for testing

pi -e npm:pk-pi-hermes-evolve

Python DSPy backend

The npm package includes an optional Python sidecar under python_backend/.

Install it manually if you want the hybrid DSPy/GEPA path:

cd python_backend
pip install -e .

The extension looks for Python in this order:

  1. PI_HERMES_EVOLVE_PYTHON
  2. python3
  3. python

If DSPy is installed, backend: auto will use the Python backend. Otherwise it falls back to TypeScript.

Usage

Interactive command

/evolve
/evolve .pi/skills/my-skill/SKILL.md
/evolve AGENTS.md

The command will ask for:

  • evolution objective
  • evaluation source:
    • mixed
    • synthetic
    • session

The tool also accepts an optional backend override:

Use self_evolve_artifact on AGENTS.md with backend python.

Tool-driven usage

Example prompt to pi:

Use self_evolve_artifact on .pi/skills/review/SKILL.md to improve trigger clarity and output quality.

Output layout

Every run writes to a timestamped directory:

.pi/hermes-self-evolution/runs/<timestamp>-<artifact>/
├── original.md
├── best-candidate.md
├── report.md
├── manifest.json
├── dataset.json
└── candidates/
    ├── candidate-1.md
    ├── candidate-1.json
    └── ...

Guardrails

Current guardrails mirror Hermes' spirit, but stay lightweight and local:

  • original file is preserved
  • candidates are written separately
  • frontmatter is preserved when present
  • existing {{placeholders}} must survive candidate generation
  • candidates over the size budget are rejected
  • human review is always required before applying changes

Important limitations

This is still not a full Hermes reproduction.

What the Python upgrade adds:

  • a real Python backend bundled with the npm package
  • DSPy-based dataset generation, judging, and candidate synthesis
  • a GEPA path when the installed DSPy build exposes dspy.GEPA
  • automatic fallback to MIPROv2 or plain Chain-of-Thought if GEPA is unavailable

What is still missing versus the full Nous vision:

  • no benchmark runner integration yet
  • no automatic pytest / external benchmark gate yet
  • no code-organism evolution
  • still optimized mainly for prompt/instruction artifacts, not general source code

So treat this as a practical hybrid phase-1 self-evolution package for pi.

Development

Install dev dependencies and type-check:

cd /c/dev/Desktop-Projects/pi-hermes-self-evolution
npm install
npm run typecheck
npm run python:check

Next useful upgrades

  • add optional validation hooks (testCommand) before recommending a candidate
  • add real execution-based evaluation via subagent runs
  • add prompt-template / skill-specific rubric presets
  • add diff rendering in the final report
  • add apply/approve workflows behind explicit confirmation
  • add benchmark/test gates to the Python backend so GEPA mutations are filtered by real task outcomes