@localaicat/pi
pi extension that warms/cools the Local AI Cat resident model over the Local API
Package details
Install @localaicat/pi from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@localaicat/pi- Package
@localaicat/pi- Version
0.1.3- Published
- Jun 16, 2026
- Downloads
- 370/mo · 287/wk
- Author
- atlascodes
- License
- MIT
- Types
- extension
- Size
- 18.8 KB
- Dependencies
- 0 dependencies · 0 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
@localaicat/pi
A pi extension that keeps the Local AI Cat resident model warm for the duration of a coding session and publishes local generation stats to pi's status line. It also exposes resource-control slash commands so you can load, unload, or stop the local server without a custom pi binary.
The Local API loads a model lazily on the first chat request, so the first turn
of a session is slow (weights load + KV warm-up). This extension proactively
loads the model when pi selects one from the localaicat provider, and
unloads it on session shutdown to return the RAM to the budget.
Install
# from npm (recommended)
pi install npm:@localaicat/pi
# or install a local checkout persistently
pi install ./tools/pi
# or one-off for a single run
pi -e ./tools/pi/index.ts
Quick start
pi install npm:@localaicat/pi
export LOCALAICAT_BASE_URL=http://127.0.0.1:11434
export LOCALAICAT_API_KEY=local # if the Local API requires a token
export LOCALAICAT_MODEL=mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
pi --provider localaicat
The extension self-configures the localaicat provider on first load by
writing ~/.pi/agent/models.json from the LOCALAICAT_* environment above
(see "Provider self-configuration"), so the install + four lines is all you need.
Provider self-configuration
On load the extension ensures pi knows about the localaicat provider. If
~/.pi/agent/models.json has no localaicat entry, it writes one:
{
"providers": {
"localaicat": {
"baseUrl": "http://127.0.0.1:11434/v1",
"api": "openai-completions",
"apiKey": "local",
"models": [{ "id": "mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit" }]
}
}
}
baseUrl comes from LOCALAICAT_BASE_URL (with /v1 appended), apiKey from
LOCALAICAT_API_KEY (default local), and the seeded model from
LOCALAICAT_MODEL (default Qwen3-Coder-30B-A3B-Instruct-4bit). An existing
localaicat entry is never overwritten — hand-edit it freely. If the write
fails for any reason, paste the block above manually.
Environment
| Variable | Default | Purpose |
|---|---|---|
LOCALAICAT_BASE_URL |
http://127.0.0.1:11434 |
Base of the Local API |
LOCALAICAT_API_KEY |
– | Local API token when the app requires one |
LOCALAICAT_PROVIDER |
localaicat |
Comma-separated pi provider names to match on model_select |
LOCALAICAT_MODEL |
– | Model to warm on session_start if none selected yet |
LOCALAICAT_LOG |
– | Append one line per load/unload action (used by tests) |
Commands
All under a single /localaicat command:
/localaicat(or/localaicat status) — show loaded state + the local RAM budget / headroom (/v1/local/resources)./localaicat load [model-id]— load a specific model, the selected pi model, orLOCALAICAT_MODEL./localaicat unload [model-id]— unload a specific model, the selected pi model, orLOCALAICAT_MODEL./localaicat stop— stop the Local API server.
Status line
The extension updates pi's footer with the last assistant turn when pi exposes usage data:
localaicat: loaded Qwen3-Coder-30B-A3B-Instruct-4bit · 28.4 tok/s · 1240/310/1550 tok · 12.8% ctx
It reads loaded/unloaded state from /v1/local/resources, OpenAI-style usage
fields (prompt_tokens, completion_tokens, total_tokens,
tokens_per_second), and pi's built-in context estimate via
ctx.getContextUsage(). If a value is missing, it is omitted rather than
blocking the session.
How it works
| pi event | action |
|---|---|
model_select (provider = localaicat) |
POST /v1/local/models/{id}/load |
message_update / message_end |
update pi footer resource + stats via setStatus |
session_start (with LOCALAICAT_MODEL) |
warm fallback load |
session_shutdown |
POST /v1/local/models/{id}/unload |
/localaicat stop |
POST /v1/local/server/stop |
If the app isn't running, actions degrade quietly — the next chat request still loads the model lazily.