pi-llama-server

Pi extension for llama-server router — live model listing, load/unload, per-project config

Package details

← Back

extension

Install pi-llama-server from npm and Pi will load the resources declared by the package manifest.

npm report

$ pi install npm:pi-llama-server

Package: pi-llama-server
Version: 1.0.1
Published: Apr 24, 2026
Downloads: 56/mo · 56/wk
Author: am17an
License: unknown
Types: extension
Size: 7.2 KB
Dependencies: 0 dependencies · 1 peer

Pi manifest JSON

{
  "extensions": [
    "./extensions/llama-server.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-llama-server

Pi extension that integrates a running llama-server instance with the Pi Coding Agent. Provides live model listing and ability to load/unload via the llama-server API.

Prerequisites

A running llama-server instance (from llama.cpp) in router-mode (the default if you don't mention -m)
Pi Coding Agent installed (@mariozechner/pi-coding-agent)

Install

pi install npm:pi-llama-server

Or from git:

pi install git:github.com/user/pi-llama-server

Pi auto-discovers the extension via pi.extensions in package.json. No additional setup needed.

Configuration

The llama-server URL is resolved in this order:

Per-project config — create .pi/llama-server.json in your project root:
```
{ "url": "http://10.0.0.5:9090" }
```

Environment variable — set globally:

export LLAMA_SERVER_URL=http://10.0.0.5:9090

Default — falls back to http://127.0.0.1:8080

Usage

Browse and manage models

Run the /models slash command inside Pi to see all models on the llama-server with live status:

Status	Meaning
🟢 `loaded`	Model is loaded and ready
🟡 `loading`	Model is being loaded
🔴 `failed`	Model failed to load
⚪ other	Unknown state

Select a model to load, unload, or switch to it.

Switch models

Use Ctrl+P (or /model) in Pi to select any llama-server model for inference. The extension will automatically tell llama-server to load the chosen model.

How it works

When Pi starts, the extension:

Resolves the llama-server URL from config/env/default
Queries GET /models to discover available GGUF models
Registers each model as an OpenAI-compatible provider under {url}/v1
Listens for model switch events and calls POST /models/load on the server
Provides the /models interactive command for managing models

llama-server endpoints used

Endpoint	Method	Purpose
`/models`	GET	List all models
`/models/load`	POST	Load a model
`/models/unload`	POST	Unload a model
`/v1/...`	POST	OpenAI-compatible completions (via Pi provider)