pi-high-availability

High Availability extension for pi - automatic failover when quota or capacity is exhausted

Packages

Package details

extension

Install pi-high-availability from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-high-availability
Package
pi-high-availability
Version
2.3.0
Published
Mar 19, 2026
Downloads
84/mo · 19/wk
Author
burggraf
License
MIT
Types
extension
Size
52.2 KB
Dependencies
0 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "extensions/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-high-availability 🔄

pi-high-availability automatically switches to fallback LLM providers when your primary provider hits quota limits or capacity constraints. Never get stuck waiting for quota resets again.

🆕 What's New in v2.3.0


Password-Store Integration — You can now use !pass references for API keys stored in password-store. The /ha UI automatically discovers matching pass entries and shows them as one-click options. API key input is now masked with a hint for !pass syntax.

Usage

# In /ha UI, you'll see matching pass entries like:
#   🔑 Use: api/anthropic/key
# Click to add as a credential

# Or manually enter:
!pass show api/anthropic/key

The !pass reference is stored in auth.json and resolved by pi at API call time.


v2.2.0

Per-Session Group Selection — You can now specify which HA group to use via the --ha-group CLI flag. This is useful when running multiple pi instances with different failover chains.

Usage

# Use default group (from ha.json defaultGroup)
pi -e .pi/extensions/gastown-hooks.js

# Use specific group via --ha-group
pi -e .pi/extensions/gastown-hooks.js --ha-group paid --model openai-codex/gpt-5.3-codex

This allows you to configure different gastown workers to use different HA groups based on their role.

How It works

When pi starts with --ha-group paid:

  1. Extension reads the flag value (paid)
  2. Validates the group exists in ha.json
  3. Sets state.activeGroup = "paid"
  4. All failover events use models from the "paid" group

When pi starts without --ha-group:

  1. Falls back to defaultGroup from ha.json
  2. All failover events use models from that group

Configurable Error Handling — You can now control how the extension responds to different types of errors:

  • Capacity Errors (e.g., "out of capacity", "engine overloaded"): These affect all accounts for a provider equally, so switching accounts doesn't help. Now you can choose to stop, retry after a timeout, or jump to next_provider.

  • Quota Errors (e.g., "rate limit exceeded", "insufficient quota"): These are per-account, so switching to another OAuth key or API key may solve the problem. Choose from stop, retry, next_provider, or next_key_then_provider (default).

Configure these in /ha under ⚙️ Settings or directly in ha.json (see Error Handling Configuration).

✨ Features

  • Unified HA Manager: A beautiful interactive TUI (/ha) with accordion-style navigation to manage all your groups and credentials in one place.
  • Automatic Multi-Tier Failover:
    1. Account Failover: Seamlessly switches between multiple accounts for the same provider.
    2. Provider Failover: Automatically jumps to the next provider in your group if all accounts for the current provider are exhausted.
  • Exhaustion Tracking: Intelligent cooldown management marks specific accounts or providers as "exhausted" on 429/capacity errors, preventing retries until they recover.
  • Dynamic Provider Discovery: Automatically detects all supported Pi providers (Anthropic, OpenAI, Gemini, Moonshot, Zai, etc.) without configuration.
  • Group Management: Create custom failover chains (e.g., "Fast Tier" → "Backup Tier") and rearrange model priority with simple keybindings.
  • Credential Sync & Storage: Automatically capture OAuth logins or manually add API keys for backup accounts.
  • Smart Error Detection: Distinguishes between quota errors and transient capacity issues, including full support for Google Gemini's internal retry patterns.

🚀 Quick Start

1. Install the Extension

pi install npm:pi-high-availability

2. Open the Manager

Run the High Availability manager to initialize your configuration:

/ha

3. Configure Your First Group

  1. Select 📂 Groups.
  2. Add or select a group (e.g., default).
  3. Add Model IDs (e.g., anthropic/claude-3-5-sonnet) to the group.
  4. Use u and d keys to rearrange the priority.

🎮 The HA Manager (/ha)

The interactive manager is your control center for high availability.

Keyboard Navigation

Key Action
/ Navigate items
Space / Expand/collapse section or toggle item
Enter Select/activate item
x / d / Delete Delete currently selected item (with confirmation)
u Move item up (reorder)
d Move item down (reorder)
Esc Cancel / Exit

📂 Group Management

  • Add/Rename/Delete groups.
  • Rearrange Priority: Use u (up) and d (down) keys to set the failover order of models within a group.
  • Per-Entry Cooldown: Set custom recovery times for specific models.
  • Delete Models: Navigate to any model entry and press x to remove it from the group.

🔑 Credential Management

  • Auto-Sync: Credentials from /login are automatically synced when you open /ha.
  • Add API Providers: Use "+ Add API Provider" to manually add providers that use API keys.
  • Add API Keys: For non-OAuth providers, add additional API keys as backups.
  • Account Priority: Use u and d keys to decide which account is primary and which are backup-1, backup-2, etc.
  • Delete Keys: Navigate to any key entry and press x to delete it.
  • Delete Providers: Navigate to a provider header (e.g., 🔌 google-gemini-cli) and press x to delete the entire provider and all its keys.

⏱️ Settings

  • Default Cooldown: Set the default recovery time (e.g., 3600000ms for 1 hour) for exhausted providers.
  • Default Group: Choose which failover chain Pi uses when it starts up.
  • Error Handling: Configure how different error types are handled:
    • Capacity Error Action: What to do when a provider reports "out of capacity" (doesn't help to switch accounts for the same provider)
    • Quota Error Action: What to do when a provider reports quota/rate limit exceeded (switching accounts may help)
    • Retry Timeout: How long to wait before retrying when using "retry" action (default: 300000ms = 5 minutes)

🔍 How Failover Works

The Failover Chain

When a quota or capacity error is detected:

  1. Try Next Account: The extension looks for another credential for the same provider (e.g., your second Google account).
  2. Mark Exhausted: The current account is marked as exhausted and won't be used again until its cooldown expires.
  3. Switch Provider: If all accounts for that provider are exhausted, the extension looks at the Active Group and switches to the next provider/model in the list.
  4. Automatic Retry: Pi automatically resends your last message using the new provider and primary account, making the transition transparent.

Error Detection

The extension detects:

  • Quota Errors: HTTP 429, "rate limit", "insufficient quota", etc.
  • Capacity Errors: "No capacity available", "Engine Overloaded", etc.
  • Gemini Awareness: Correctly waits for Google's internal retry attempts before triggering a failover.

⚙️ Configuration Guide (ha.json)

While you should use the /ha UI, you can also manually edit ~/.pi/agent/ha.json:

{
  "groups": {
    "pro": {
      "name": "Professional Tier",
      "entries": [
        { "id": "anthropic/claude-3-5-sonnet" },
        { "id": "google-gemini-cli/gemini-1.5-pro", "cooldownMs": 1800000 }
      ]
    }
  },
  "defaultGroup": "pro",
  "defaultCooldownMs": 3600000,
  "errorHandling": {
    "capacityErrorAction": "next_provider",
    "quotaErrorAction": "next_key_then_provider",
    "retryTimeoutMs": 300000
  },
  "credentials": {
    "anthropic": {
      "primary": { "type": "oauth", "refresh": "...", "access": "..." },
      "backup-1": { "type": "api_key", "key": "..." }
    }
  }
}

Error Handling Configuration

The errorHandling section in ha.json lets you customize how the extension responds to different error types:

Setting Description Default
capacityErrorAction Action when provider has no capacity (affects all accounts) next_key_then_provider
quotaErrorAction Action when account hits rate limit (may not affect other accounts) next_key_then_provider
retryTimeoutMs How long to wait before retrying (in milliseconds) 300000 (5 minutes)

Understanding the Error Types

Capacity Errors occur when a provider's servers are overloaded. Examples:

  • "No capacity available for this model"
  • "Engine overloaded"
  • "Service temporarily unavailable"

These errors affect the provider's infrastructure, so switching to a different account for the same provider typically won't help. Recommended action: next_provider or retry.

Quota Errors occur when an account exceeds its limits. Examples:

  • "Rate limit exceeded (429)"
  • "Insufficient quota"
  • "Daily limit reached"

These errors are per-account, so switching to another OAuth entry or API key for the same provider may solve the problem. Recommended action: next_key_then_provider (default) or next_provider if you don't have backup accounts.

Available Actions

The following actions can be configured for both capacityErrorAction and quotaErrorAction:

Action Description
stop Stop the process and display the error (default if pi-high-availability is not installed)
retry Wait for retryTimeoutMs milliseconds, then retry the same request
next_provider Immediately switch to the next provider in the current group
next_key_then_provider Try the next account/key for the current provider, then move to next provider if all exhausted (default)

Note: For capacity errors, next_key_then_provider is often not helpful since all accounts for the same provider typically share the same capacity pool. Use next_provider or retry for capacity errors instead.

📄 License

MIT

🤝 Contributing

Contributions welcome! Please open an issue or PR on GitHub.

🙏 Credits

Built for the pi coding agent community.