@aittalam/pi-llamafile

Pi extension that supervises local llamafile-served model processes — start, stop, adopt, with progress visible on quit

Packages

Package details

extension

Install @aittalam/pi-llamafile from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@aittalam/pi-llamafile
Package
@aittalam/pi-llamafile
Version
1.0.0
Published
May 21, 2026
Downloads
not available
Author
aittalam
License
MIT
Types
extension
Size
76.1 KB
Dependencies
0 dependencies · 1 peer
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

Llamafiles provider extension for pi

A pi extension that supervises locally-run llamafile-style model servers (or any OpenAI-compatible server you can launch from a binary). It registers a llamafiles provider with pi, starts the configured binary when you pick one of its models, and stops it when you switch away or quit.

Status

Implementation complete; covered by 55 automated tests (42 unit + 13 integration via the SDK driver). npm test runs in ~15s and exits 0.

See SPECS.md for the behavioral contract, PLAN.md for the implementation plan, and NOTES.md for the pi API findings that informed the design.

Features

  • Process supervision — starts the configured binary on /model, waits for /v1/models to respond, then reports ready. One process per pi session.
  • Per-model port — each model declares its own port; pi sends requests there. Default 8080.
  • {{port}} substitution in argsport is the single source of truth; reference it in your binary's arg list as {{port}}.
  • Adoption — if a compatible server is already running on the port, pi adopts it instead of spawning a duplicate.
  • Foreign-port safety — if a different server holds the port, pi surfaces the conflict and tells you to free it manually. It never kills processes it did not start.
  • Visible quit progress — when you exit pi while it owns a running process, the extension prints "Stopping llamafile ..." and "Stopped llamafile ..." to stderr so the user can see the wait.
  • Transparent reload/reload does not prompt; the process keeps running and the freshly loaded extension instance re-adopts it.
  • Cleanup-by-design — adopted processes are never stopped without your consent.

Installation

Three options, in order of recommendation:

From npm (most convenient, gets gallery indexing):

pi install npm:@aittalam/pi-llamafile

From git (pins to a tag, no npm account needed by you or by pi):

pi install git:github.com/aittalam/pi-llamafile@v1.0.0

From source (for hacking on the extension):

git clone https://github.com/aittalam/pi-llamafile ~/.pi/agent/extensions/pi-llamafile
cd ~/.pi/agent/extensions/pi-llamafile
npm install

In all cases, use /reload from a running pi session, or restart pi, to pick up changes.

Configuration

Define your llamafile models in ~/.pi/agent/models.json under the llamafiles provider:

{
  "providers": {
    "llamafiles": {
      "models": [
        {
          "id": "qwen3-9b",
          "name": "Qwen3 9B",
          "command": "sh",
          "args": [
            "/path/to/qwen3-9b.llamafile",
            "--server",
            "--port",
            "{{port}}",
            "--jinja"
          ],
          "port": 8080,
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 32768,
          "maxTokens": 4096
        }
      ]
    }
  }
}

Optional per-model fields beyond pi's standard set:

Field Description
command Executable to spawn. Required.
args Argument list. {{port}} is substituted at spawn.
port TCP port the server listens on. Default 8080.
env Extra environment variables for the spawned process.
cwd Working directory for the spawned process.

See SPECS.md §3.2 for the complete schema and defaults.

Usage

  1. pi --list-models should show your llamafile models under the llamafiles provider.
  2. Use /model to pick one. The extension spawns the binary, waits for readiness, and reports <name> is ready.
  3. Switch with /model again. The current process is stopped and the new one started.
  4. Switch to any non-llamafile model: the running llamafile is stopped.
  5. /quit (or Ctrl+D): the running llamafile is stopped. Two lines appear on the terminal (stderr): Stopping llamafile "<name>" ... then Stopped llamafile "<name>". Adopted servers (started outside pi) are left running silently.
  6. /llamafiles: prints the current state.

Development

npm install        # install dev deps
npm test           # 56 tests, unit + integration
npm run test:unit  # 45 unit tests, sub-second
npm run dev        # pi -e . — smoke-load against your real HOME

Logs from spawned binaries are appended to ~/.pi/llamafile_logs/<modelId>.log.

Layout

SPECS.md          # behavioral contract (single source of truth)
PLAN.md           # implementation plan
NOTES.md          # pi API findings
index.ts          # thin wiring: events, commands
src/
  config.ts       # models.json loader
  template.ts     # {{port}} substitution
  process.ts      # LlamafileSupervisor
  log.ts          # file log streams
  notify.ts       # notification text
  types.ts        # shared types
tests/
  unit/           # 3 files, 42 tests
  integration/    # 13 files, all SDK-driven
  helpers/        # fake-server, harness, port allocator
  MANUAL.md       # checklist for things automation cannot reach

License

MIT.