pi-llama-server

Pi extension for llama-server router — live model listing, load/unload, per-project config

Package details

extension

Install pi-llama-server from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-llama-server
Package
pi-llama-server
Version
1.0.1
Published
Apr 24, 2026
Downloads
56/mo · 56/wk
Author
am17an
License
unknown
Types
extension
Size
7.2 KB
Dependencies
0 dependencies · 1 peer
Pi manifest JSON
{
  "extensions": [
    "./extensions/llama-server.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-llama-server

Pi extension that integrates a running llama-server instance with the Pi Coding Agent. Provides live model listing and ability to load/unload via the llama-server API.

Prerequisites

  • A running llama-server instance (from llama.cpp) in router-mode (the default if you don't mention -m)
  • Pi Coding Agent installed (@mariozechner/pi-coding-agent)

Install

pi install npm:pi-llama-server

Or from git:

pi install git:github.com/user/pi-llama-server

Pi auto-discovers the extension via pi.extensions in package.json. No additional setup needed.

Configuration

The llama-server URL is resolved in this order:

  1. Per-project config — create .pi/llama-server.json in your project root:
    { "url": "http://10.0.0.5:9090" }
    
  2. Environment variable — set globally:
    export LLAMA_SERVER_URL=http://10.0.0.5:9090
    
  3. Default — falls back to http://127.0.0.1:8080

Usage

Browse and manage models

Run the /models slash command inside Pi to see all models on the llama-server with live status:

Status Meaning
🟢 loaded Model is loaded and ready
🟡 loading Model is being loaded
🔴 failed Model failed to load
⚪ other Unknown state

Select a model to load, unload, or switch to it.

Switch models

Use Ctrl+P (or /model) in Pi to select any llama-server model for inference. The extension will automatically tell llama-server to load the chosen model.

How it works

When Pi starts, the extension:

  1. Resolves the llama-server URL from config/env/default
  2. Queries GET /models to discover available GGUF models
  3. Registers each model as an OpenAI-compatible provider under {url}/v1
  4. Listens for model switch events and calls POST /models/load on the server
  5. Provides the /models interactive command for managing models

llama-server endpoints used

Endpoint Method Purpose
/models GET List all models
/models/load POST Load a model
/models/unload POST Unload a model
/v1/... POST OpenAI-compatible completions (via Pi provider)