@localaicat/pi

pi extension that warms/cools the Local AI Cat resident model over the Local API

Packages

Package details

extension

Install @localaicat/pi from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@localaicat/pi

Package: @localaicat/pi
Version: 0.1.0
Published: Jun 4, 2026
Downloads: not available
Author: atlascodes
License: MIT
Types: extension
Size: 14.9 KB
Dependencies: 0 dependencies · 0 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

@localaicat/pi

A pi extension that keeps the Local AI Cat resident model warm for the duration of a coding session and publishes local generation stats to pi's status line. It also exposes resource-control slash commands so you can load, unload, or stop the local server without a custom pi binary.

The Local API loads a model lazily on the first chat request, so the first turn of a session is slow (weights load + KV warm-up). This extension proactively loads the model when pi selects one from the localaicat provider, and unloads it on session shutdown to return the RAM to the budget.

Install

# one-off for a single run
pi -e ./tools/pi/index.ts

# or install it persistently
pi install ./tools/pi

# or, once published, from npm
pi install npm:@localaicat/pi

Configure pi to use Local AI Cat

~/.pi/agent/models.json:

{
  "providers": {
    "localaicat": {
      "baseUrl": "http://127.0.0.1:11434/v1",
      "api": "openai-completions",
      "apiKey": "local",
      "models": [{ "id": "mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit" }]
    }
  }
}

Then: pi --provider localaicat --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit

The extension self-configures this localaicat provider on first load if it's missing, so a single pi install … is usually all you need.

Environment

Variable	Default	Purpose
`LOCALAICAT_BASE_URL`	`http://127.0.0.1:11434`	Base of the Local API
`LOCALAICAT_API_KEY`	–	Local API token when the app requires one
`LOCALAICAT_PROVIDER`	`localaicat`	Comma-separated pi provider names to match on `model_select`
`LOCALAICAT_MODEL`	–	Model to warm on `session_start` if none selected yet
`LOCALAICAT_LOG`	–	Append one line per load/unload action (used by tests)

Commands

All under a single /localaicat command:

/localaicat (or /localaicat status) — show loaded state + the local RAM budget / headroom (/v1/local/resources).
/localaicat load [model-id] — load a specific model, the selected pi model, or LOCALAICAT_MODEL.
/localaicat unload [model-id] — unload a specific model, the selected pi model, or LOCALAICAT_MODEL.
/localaicat stop — stop the Local API server.

Status line

The extension updates pi's footer with the last assistant turn when pi exposes usage data:

localaicat: loaded Qwen3-Coder-30B-A3B-Instruct-4bit · 28.4 tok/s · 1240/310/1550 tok · 12.8% ctx

It reads loaded/unloaded state from /v1/local/resources, OpenAI-style usage fields (prompt_tokens, completion_tokens, total_tokens, tokens_per_second), and pi's built-in context estimate via ctx.getContextUsage(). If a value is missing, it is omitted rather than blocking the session.

How it works

pi event	action
`model_select` (provider = `localaicat`)	`POST /v1/local/models/{id}/load`
`message_update` / `message_end`	update pi footer resource + stats via `setStatus`
`session_start` (with `LOCALAICAT_MODEL`)	warm fallback `load`
`session_shutdown`	`POST /v1/local/models/{id}/unload`
`/localaicat stop`	`POST /v1/local/server/stop`

If the app isn't running, actions degrade quietly — the next chat request still loads the model lazily.