@danielmeneses/pi-llama-swap

Pi extension: llama-swap provider with dynamic model discovery

Packages

Package details

extension

Install @danielmeneses/pi-llama-swap from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@danielmeneses/pi-llama-swap
Package
@danielmeneses/pi-llama-swap
Version
0.1.0
Published
Jun 5, 2026
Downloads
209/mo · 37/wk
Author
danielmeneses
License
Apache-2.0
Types
extension
Size
39.7 KB
Dependencies
1 dependency · 0 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-llama-swap

Pi coding agent extension that registers a llama-swap provider and discovers models from a running llama-swap instance.

What it does

  • Injects provider llama-swap with models from GET /v1/models
  • Resolves per-model context from llama-swap APIs (/v1/models, /running) with 256K default — see Context window
  • Uses OpenAI Chat Completions API (openai-completions) for streaming
  • Reads optional config from ~/.pi/agent/pi-llama-swap.json to override defaults

Requirements

  • pi coding agent (@earendil-works/pi-coding-agent)
  • llama-swap running and reachable
  • Node.js 18+ (for fetch)

Quick start

# Terminal 1: start llama-swap (example)
llama-swap --config ~/llama-swap/config.yaml --listen 127.0.0.1:8080

# Terminal 2: load extension
cd /path/to/pi-llama-swap
pi -e .

In pi: /model → pick llama-swap/your-model-id.

Verify from CLI:

pi -e . --list-models | grep llama-swap
curl http://127.0.0.1:8080/v1/models

Configuration

Defaults

Setting Default
Origin http://127.0.0.1
Port 8080
Base URL http://127.0.0.1:8080/v1

No config file needed when llama-swap runs on the defaults above.

Config file

Create ~/.pi/agent/pi-llama-swap.json to override defaults:

{
  "origin": "http://127.0.0.1",
  "port": 8080,
  "apiKey": "optional-key"
}
Field Description
origin Scheme + host (e.g. http://192.168.1.10)
port TCP port (1–65535)
basePath API path prefix (default /v1; normalized to end with /v1)
apiKey Bearer token when llama-swap uses apiKeys

Load order: defaults → ~/.pi/agent/pi-llama-swap.json → environment variables.

Context window (per model)

Context size applies only to llama-swap/* models registered by this extension. Other pi providers are unchanged.

Resolution runs once at extension startup (pi -e .) in lib/context.tsbuildModelLimits():

  1. List modelsGET {baseUrl}/models (OpenAI-compatible /v1/models)
  2. Per model id, set contextWindow using first match below
  3. Register models with pi via registerProvider("llama-swap", …)

Resolution order (first match wins)

Priority Source How
1 GET /v1/models entry Top-level: context_length, max_context_length, context_window
2 GET /v1/models metadata meta.llamaswap.context_length, .context, .max_context, .max_context_length; or meta.n_ctx; or metadata.context_length / metadata.context
3 GET /running (loaded models only) For each running process: upstream llama-server GET {proxy}/propsdefault_generation_settings.n_ctx; else parse -c / --ctx-size from cmd
4 Default 256K262144 tokens when nothing above reports a value

/running overrides /v1/models for the same model id when both exist (running value wins).

Max output tokens (maxTokens)

Source Fallback
output_length or max_tokens on /v1/models (or meta.llamaswap.*) min(8192, floor(contextWindow / 4))

Typical behavior

  • Idle models (not loaded in llama-swap yet): usually 256K unless llama-swap adds context fields to /v1/models
  • Running model at startup: real ctx from /running + upstream /props (e.g. 262144 from -c in cmd)
  • After load: context does not auto-refresh — restart pi -e . to pick up new /running values

To expose ctx for all models without restart, configure llama-swap so /v1/models includes context_length (or metadata.context_length) per model.

Example

# See what pi registered
pi -e . --list-models | grep llama-swap

# Raw model list from llama-swap
curl -s http://127.0.0.1:8080/v1/models | jq '.data[] | {id, context_length, meta}'

# Running models + cmds (context for loaded upstream)
curl -s http://127.0.0.1:8080/running | jq

Restrict permissions when storing API keys:

chmod 600 ~/.pi/agent/pi-llama-swap.json

Environment variables

Optional runtime overrides (highest precedence):

Variable Purpose
LLAMA_SWAP_URL Origin URL (scheme + host, optional port/path)
LLAMA_SWAP_PORT Port override
LLAMA_SWAP_API_KEY API key override

Auth

If llama-swap uses apiKeys in its config, set "apiKey" in pi-llama-swap.json or LLAMA_SWAP_API_KEY.

Without a key, extension uses placeholder local-no-auth so models appear in /model. Pi may send Authorization: Bearer local-no-auth; most unsecured local installs ignore it.

Troubleshooting

Symptom What to try
Cannot reach llama-swap Start llama-swap; check origin/port in config file
HTTP 401 Set apiKey in config or LLAMA_SWAP_API_KEY
0 models Ensure models in llama-swap config; curl http://127.0.0.1:8080/v1/models
Extension loads but chat fails Confirm model id; first request may load model (slow)
Config ignored File must be ~/.pi/agent/pi-llama-swap.json; restart pi after edits

Project layout

pi-llama-swap/
├── index.ts          # Extension entry
├── lib/
│   ├── config.ts     # Load ~/.pi/agent/pi-llama-swap.json
│   ├── url.ts        # URL building
│   ├── client.ts     # GET /v1/models
│   ├── context.ts    # Context window from llama-swap APIs
│   ├── provider.ts   # registerProvider
│   └── types.ts
├── package.json
└── README.md

License

MIT (see repository if published).