@aittalam/pi-llamafile

Pi extension that supervises local llamafile-served model processes — start, stop, adopt, with progress visible on quit

Packages

Package details

extension

Install @aittalam/pi-llamafile from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@aittalam/pi-llamafile

Package: @aittalam/pi-llamafile
Version: 1.0.0
Published: May 21, 2026
Downloads: not available
Author: aittalam
License: MIT
Types: extension
Size: 76.1 KB
Dependencies: 0 dependencies · 1 peer

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

Llamafiles provider extension for pi

A pi extension that supervises locally-run llamafile-style model servers (or any OpenAI-compatible server you can launch from a binary). It registers a llamafiles provider with pi, starts the configured binary when you pick one of its models, and stops it when you switch away or quit.

Status

Implementation complete; covered by 55 automated tests (42 unit + 13 integration via the SDK driver). npm test runs in ~15s and exits 0.

See SPECS.md for the behavioral contract, PLAN.md for the implementation plan, and NOTES.md for the pi API findings that informed the design.

Features

Process supervision — starts the configured binary on /model, waits for /v1/models to respond, then reports ready. One process per pi session.
Per-model port — each model declares its own port; pi sends requests there. Default 8080.
{{port}} substitution in args — port is the single source of truth; reference it in your binary's arg list as {{port}}.
Adoption — if a compatible server is already running on the port, pi adopts it instead of spawning a duplicate.
Foreign-port safety — if a different server holds the port, pi surfaces the conflict and tells you to free it manually. It never kills processes it did not start.
Visible quit progress — when you exit pi while it owns a running process, the extension prints "Stopping llamafile ..." and "Stopped llamafile ..." to stderr so the user can see the wait.
Transparent reload — /reload does not prompt; the process keeps running and the freshly loaded extension instance re-adopts it.
Cleanup-by-design — adopted processes are never stopped without your consent.

Installation

Three options, in order of recommendation:

From npm (most convenient, gets gallery indexing):

pi install npm:@aittalam/pi-llamafile

From git (pins to a tag, no npm account needed by you or by pi):

pi install git:github.com/aittalam/pi-llamafile@v1.0.0

From source (for hacking on the extension):

git clone https://github.com/aittalam/pi-llamafile ~/.pi/agent/extensions/pi-llamafile
cd ~/.pi/agent/extensions/pi-llamafile
npm install

In all cases, use /reload from a running pi session, or restart pi, to pick up changes.

Configuration

Define your llamafile models in ~/.pi/agent/models.json under the llamafiles provider:

{
  "providers": {
    "llamafiles": {
      "models": [
        {
          "id": "qwen3-9b",
          "name": "Qwen3 9B",
          "command": "sh",
          "args": [
            "/path/to/qwen3-9b.llamafile",
            "--server",
            "--port",
            "{{port}}",
            "--jinja"
          ],
          "port": 8080,
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 32768,
          "maxTokens": 4096
        }
      ]
    }
  }
}

Optional per-model fields beyond pi's standard set:

Field	Description
`command`	Executable to spawn. Required.
`args`	Argument list. `{{port}}` is substituted at spawn.
`port`	TCP port the server listens on. Default `8080`.
`env`	Extra environment variables for the spawned process.
`cwd`	Working directory for the spawned process.

See SPECS.md §3.2 for the complete schema and defaults.

Usage

pi --list-models should show your llamafile models under the llamafiles provider.
Use /model to pick one. The extension spawns the binary, waits for readiness, and reports <name> is ready.
Switch with /model again. The current process is stopped and the new one started.
Switch to any non-llamafile model: the running llamafile is stopped.
/quit (or Ctrl+D): the running llamafile is stopped. Two lines appear on the terminal (stderr): Stopping llamafile "<name>" ... then Stopped llamafile "<name>". Adopted servers (started outside pi) are left running silently.
/llamafiles: prints the current state.

Development

npm install        # install dev deps
npm test           # 56 tests, unit + integration
npm run test:unit  # 45 unit tests, sub-second
npm run dev        # pi -e . — smoke-load against your real HOME

Logs from spawned binaries are appended to ~/.pi/llamafile_logs/<modelId>.log.

Layout

SPECS.md          # behavioral contract (single source of truth)
PLAN.md           # implementation plan
NOTES.md          # pi API findings
index.ts          # thin wiring: events, commands
src/
  config.ts       # models.json loader
  template.ts     # {{port}} substitution
  process.ts      # LlamafileSupervisor
  log.ts          # file log streams
  notify.ts       # notification text
  types.ts        # shared types
tests/
  unit/           # 3 files, 42 tests
  integration/    # 13 files, all SDK-driven
  helpers/        # fake-server, harness, port allocator
  MANUAL.md       # checklist for things automation cannot reach

License

MIT.