pi-high-availability

High Availability extension for pi - automatic failover when quota or capacity is exhausted

Package details

extension

Install pi-high-availability from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-high-availability
Package
pi-high-availability
Version
2.3.0
Published
Mar 19, 2026
Downloads
56/mo ยท 18/wk
Author
burggraf
License
MIT
Types
extension
Size
52.2 KB
Dependencies
0 dependencies ยท 2 peers
Pi manifest JSON
{
  "extensions": [
    "extensions/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-high-availability ๐Ÿ”„

pi-high-availability automatically switches to fallback LLM providers when your primary provider hits quota limits or capacity constraints. Never get stuck waiting for quota resets again.

๐Ÿ†• What's New in v2.3.0


Password-Store Integration โ€” You can now use !pass references for API keys stored in password-store. The /ha UI automatically discovers matching pass entries and shows them as one-click options. API key input is now masked with a hint for !pass syntax.

Usage

# In /ha UI, you'll see matching pass entries like:
#   ๐Ÿ”‘ Use: api/anthropic/key
# Click to add as a credential

# Or manually enter:
!pass show api/anthropic/key

The !pass reference is stored in auth.json and resolved by pi at API call time.


v2.2.0

Per-Session Group Selection โ€” You can now specify which HA group to use via the --ha-group CLI flag. This is useful when running multiple pi instances with different failover chains.

Usage

# Use default group (from ha.json defaultGroup)
pi -e .pi/extensions/gastown-hooks.js

# Use specific group via --ha-group
pi -e .pi/extensions/gastown-hooks.js --ha-group paid --model openai-codex/gpt-5.3-codex

This allows you to configure different gastown workers to use different HA groups based on their role.

How It works

When pi starts with --ha-group paid:

  1. Extension reads the flag value (paid)
  2. Validates the group exists in ha.json
  3. Sets state.activeGroup = "paid"
  4. All failover events use models from the "paid" group

When pi starts without --ha-group:

  1. Falls back to defaultGroup from ha.json
  2. All failover events use models from that group

Configurable Error Handling โ€” You can now control how the extension responds to different types of errors:

  • Capacity Errors (e.g., "out of capacity", "engine overloaded"): These affect all accounts for a provider equally, so switching accounts doesn't help. Now you can choose to stop, retry after a timeout, or jump to next_provider.

  • Quota Errors (e.g., "rate limit exceeded", "insufficient quota"): These are per-account, so switching to another OAuth key or API key may solve the problem. Choose from stop, retry, next_provider, or next_key_then_provider (default).

Configure these in /ha under โš™๏ธ Settings or directly in ha.json (see Error Handling Configuration).

โœจ Features

  • Unified HA Manager: A beautiful interactive TUI (/ha) with accordion-style navigation to manage all your groups and credentials in one place.
  • Automatic Multi-Tier Failover:
    1. Account Failover: Seamlessly switches between multiple accounts for the same provider.
    2. Provider Failover: Automatically jumps to the next provider in your group if all accounts for the current provider are exhausted.
  • Exhaustion Tracking: Intelligent cooldown management marks specific accounts or providers as "exhausted" on 429/capacity errors, preventing retries until they recover.
  • Dynamic Provider Discovery: Automatically detects all supported Pi providers (Anthropic, OpenAI, Gemini, Moonshot, Zai, etc.) without configuration.
  • Group Management: Create custom failover chains (e.g., "Fast Tier" โ†’ "Backup Tier") and rearrange model priority with simple keybindings.
  • Credential Sync & Storage: Automatically capture OAuth logins or manually add API keys for backup accounts.
  • Smart Error Detection: Distinguishes between quota errors and transient capacity issues, including full support for Google Gemini's internal retry patterns.

๐Ÿš€ Quick Start

1. Install the Extension

pi install npm:pi-high-availability

2. Open the Manager

Run the High Availability manager to initialize your configuration:

/ha

3. Configure Your First Group

  1. Select ๐Ÿ“‚ Groups.
  2. Add or select a group (e.g., default).
  3. Add Model IDs (e.g., anthropic/claude-3-5-sonnet) to the group.
  4. Use u and d keys to rearrange the priority.

๐ŸŽฎ The HA Manager (/ha)

The interactive manager is your control center for high availability.

Keyboard Navigation

Key Action
โ†‘ / โ†“ Navigate items
Space / โ†’ Expand/collapse section or toggle item
Enter Select/activate item
x / d / Delete Delete currently selected item (with confirmation)
u Move item up (reorder)
d Move item down (reorder)
Esc Cancel / Exit

๐Ÿ“‚ Group Management

  • Add/Rename/Delete groups.
  • Rearrange Priority: Use u (up) and d (down) keys to set the failover order of models within a group.
  • Per-Entry Cooldown: Set custom recovery times for specific models.
  • Delete Models: Navigate to any model entry and press x to remove it from the group.

๐Ÿ”‘ Credential Management

  • Auto-Sync: Credentials from /login are automatically synced when you open /ha.
  • Add API Providers: Use "+ Add API Provider" to manually add providers that use API keys.
  • Add API Keys: For non-OAuth providers, add additional API keys as backups.
  • Account Priority: Use u and d keys to decide which account is primary and which are backup-1, backup-2, etc.
  • Delete Keys: Navigate to any key entry and press x to delete it.
  • Delete Providers: Navigate to a provider header (e.g., ๐Ÿ”Œ google-gemini-cli) and press x to delete the entire provider and all its keys.

โฑ๏ธ Settings

  • Default Cooldown: Set the default recovery time (e.g., 3600000ms for 1 hour) for exhausted providers.
  • Default Group: Choose which failover chain Pi uses when it starts up.
  • Error Handling: Configure how different error types are handled:
    • Capacity Error Action: What to do when a provider reports "out of capacity" (doesn't help to switch accounts for the same provider)
    • Quota Error Action: What to do when a provider reports quota/rate limit exceeded (switching accounts may help)
    • Retry Timeout: How long to wait before retrying when using "retry" action (default: 300000ms = 5 minutes)

๐Ÿ” How Failover Works

The Failover Chain

When a quota or capacity error is detected:

  1. Try Next Account: The extension looks for another credential for the same provider (e.g., your second Google account).
  2. Mark Exhausted: The current account is marked as exhausted and won't be used again until its cooldown expires.
  3. Switch Provider: If all accounts for that provider are exhausted, the extension looks at the Active Group and switches to the next provider/model in the list.
  4. Automatic Retry: Pi automatically resends your last message using the new provider and primary account, making the transition transparent.

Error Detection

The extension detects:

  • Quota Errors: HTTP 429, "rate limit", "insufficient quota", etc.
  • Capacity Errors: "No capacity available", "Engine Overloaded", etc.
  • Gemini Awareness: Correctly waits for Google's internal retry attempts before triggering a failover.

โš™๏ธ Configuration Guide (ha.json)

While you should use the /ha UI, you can also manually edit ~/.pi/agent/ha.json:

{
  "groups": {
    "pro": {
      "name": "Professional Tier",
      "entries": [
        { "id": "anthropic/claude-3-5-sonnet" },
        { "id": "google-gemini-cli/gemini-1.5-pro", "cooldownMs": 1800000 }
      ]
    }
  },
  "defaultGroup": "pro",
  "defaultCooldownMs": 3600000,
  "errorHandling": {
    "capacityErrorAction": "next_provider",
    "quotaErrorAction": "next_key_then_provider",
    "retryTimeoutMs": 300000
  },
  "credentials": {
    "anthropic": {
      "primary": { "type": "oauth", "refresh": "...", "access": "..." },
      "backup-1": { "type": "api_key", "key": "..." }
    }
  }
}

Error Handling Configuration

The errorHandling section in ha.json lets you customize how the extension responds to different error types:

Setting Description Default
capacityErrorAction Action when provider has no capacity (affects all accounts) next_key_then_provider
quotaErrorAction Action when account hits rate limit (may not affect other accounts) next_key_then_provider
retryTimeoutMs How long to wait before retrying (in milliseconds) 300000 (5 minutes)

Understanding the Error Types

Capacity Errors occur when a provider's servers are overloaded. Examples:

  • "No capacity available for this model"
  • "Engine overloaded"
  • "Service temporarily unavailable"

These errors affect the provider's infrastructure, so switching to a different account for the same provider typically won't help. Recommended action: next_provider or retry.

Quota Errors occur when an account exceeds its limits. Examples:

  • "Rate limit exceeded (429)"
  • "Insufficient quota"
  • "Daily limit reached"

These errors are per-account, so switching to another OAuth entry or API key for the same provider may solve the problem. Recommended action: next_key_then_provider (default) or next_provider if you don't have backup accounts.

Available Actions

The following actions can be configured for both capacityErrorAction and quotaErrorAction:

Action Description
stop Stop the process and display the error (default if pi-high-availability is not installed)
retry Wait for retryTimeoutMs milliseconds, then retry the same request
next_provider Immediately switch to the next provider in the current group
next_key_then_provider Try the next account/key for the current provider, then move to next provider if all exhausted (default)

Note: For capacity errors, next_key_then_provider is often not helpful since all accounts for the same provider typically share the same capacity pool. Use next_provider or retry for capacity errors instead.

๐Ÿ“„ License

MIT

๐Ÿค Contributing

Contributions welcome! Please open an issue or PR on GitHub.

๐Ÿ™ Credits

Built for the pi coding agent community.