pi-llama-server
Pi extension for llama-server router — live model listing, load/unload, per-project config
Package details
Install pi-llama-server from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-llama-server- Package
pi-llama-server- Version
1.0.1- Published
- Apr 24, 2026
- Downloads
- 56/mo · 56/wk
- Author
- am17an
- License
- unknown
- Types
- extension
- Size
- 7.2 KB
- Dependencies
- 0 dependencies · 1 peer
Pi manifest JSON
{
"extensions": [
"./extensions/llama-server.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-llama-server
Pi extension that integrates a running llama-server instance with the Pi Coding Agent. Provides live model listing and ability to load/unload via the llama-server API.
Prerequisites
- A running llama-server instance (from llama.cpp) in
router-mode(the default if you don't mention-m) - Pi Coding Agent installed (
@mariozechner/pi-coding-agent)
Install
pi install npm:pi-llama-server
Or from git:
pi install git:github.com/user/pi-llama-server
Pi auto-discovers the extension via pi.extensions in package.json. No additional setup needed.
Configuration
The llama-server URL is resolved in this order:
- Per-project config — create
.pi/llama-server.jsonin your project root:{ "url": "http://10.0.0.5:9090" } - Environment variable — set globally:
export LLAMA_SERVER_URL=http://10.0.0.5:9090 - Default — falls back to
http://127.0.0.1:8080
Usage
Browse and manage models
Run the /models slash command inside Pi to see all models on the llama-server with live status:
| Status | Meaning |
|---|---|
🟢 loaded |
Model is loaded and ready |
🟡 loading |
Model is being loaded |
🔴 failed |
Model failed to load |
| ⚪ other | Unknown state |
Select a model to load, unload, or switch to it.
Switch models
Use Ctrl+P (or /model) in Pi to select any llama-server model for inference. The extension will automatically tell llama-server to load the chosen model.
How it works
When Pi starts, the extension:
- Resolves the llama-server URL from config/env/default
- Queries
GET /modelsto discover available GGUF models - Registers each model as an OpenAI-compatible provider under
{url}/v1 - Listens for model switch events and calls
POST /models/loadon the server - Provides the
/modelsinteractive command for managing models
llama-server endpoints used
| Endpoint | Method | Purpose |
|---|---|---|
/models |
GET | List all models |
/models/load |
POST | Load a model |
/models/unload |
POST | Unload a model |
/v1/... |
POST | OpenAI-compatible completions (via Pi provider) |