pi-high-availability
High Availability extension for pi - automatic failover when quota or capacity is exhausted
Package details
Install pi-high-availability from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-high-availability- Package
pi-high-availability- Version
2.3.0- Published
- Mar 19, 2026
- Downloads
- 56/mo ยท 18/wk
- Author
- burggraf
- License
- MIT
- Types
- extension
- Size
- 52.2 KB
- Dependencies
- 0 dependencies ยท 2 peers
Pi manifest JSON
{
"extensions": [
"extensions/index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-high-availability ๐
pi-high-availability automatically switches to fallback LLM providers when your primary provider hits quota limits or capacity constraints. Never get stuck waiting for quota resets again.
๐ What's New in v2.3.0
Password-Store Integration โ You can now use !pass references for API keys stored in password-store. The /ha UI automatically discovers matching pass entries and shows them as one-click options. API key input is now masked with a hint for !pass syntax.
Usage
# In /ha UI, you'll see matching pass entries like:
# ๐ Use: api/anthropic/key
# Click to add as a credential
# Or manually enter:
!pass show api/anthropic/key
The !pass reference is stored in auth.json and resolved by pi at API call time.
v2.2.0
Per-Session Group Selection โ You can now specify which HA group to use via the --ha-group CLI flag. This is useful when running multiple pi instances with different failover chains.
Usage
# Use default group (from ha.json defaultGroup)
pi -e .pi/extensions/gastown-hooks.js
# Use specific group via --ha-group
pi -e .pi/extensions/gastown-hooks.js --ha-group paid --model openai-codex/gpt-5.3-codex
This allows you to configure different gastown workers to use different HA groups based on their role.
How It works
When pi starts with --ha-group paid:
- Extension reads the flag value (
paid) - Validates the group exists in
ha.json - Sets
state.activeGroup = "paid" - All failover events use models from the "paid" group
When pi starts without --ha-group:
- Falls back to
defaultGroupfromha.json - All failover events use models from that group
Configurable Error Handling โ You can now control how the extension responds to different types of errors:
Capacity Errors (e.g., "out of capacity", "engine overloaded"): These affect all accounts for a provider equally, so switching accounts doesn't help. Now you can choose to
stop,retryafter a timeout, or jump tonext_provider.Quota Errors (e.g., "rate limit exceeded", "insufficient quota"): These are per-account, so switching to another OAuth key or API key may solve the problem. Choose from
stop,retry,next_provider, ornext_key_then_provider(default).
Configure these in /ha under โ๏ธ Settings or directly in ha.json (see Error Handling Configuration).
โจ Features
- Unified HA Manager: A beautiful interactive TUI (
/ha) with accordion-style navigation to manage all your groups and credentials in one place. - Automatic Multi-Tier Failover:
- Account Failover: Seamlessly switches between multiple accounts for the same provider.
- Provider Failover: Automatically jumps to the next provider in your group if all accounts for the current provider are exhausted.
- Exhaustion Tracking: Intelligent cooldown management marks specific accounts or providers as "exhausted" on 429/capacity errors, preventing retries until they recover.
- Dynamic Provider Discovery: Automatically detects all supported Pi providers (Anthropic, OpenAI, Gemini, Moonshot, Zai, etc.) without configuration.
- Group Management: Create custom failover chains (e.g., "Fast Tier" โ "Backup Tier") and rearrange model priority with simple keybindings.
- Credential Sync & Storage: Automatically capture OAuth logins or manually add API keys for backup accounts.
- Smart Error Detection: Distinguishes between quota errors and transient capacity issues, including full support for Google Gemini's internal retry patterns.
๐ Quick Start
1. Install the Extension
pi install npm:pi-high-availability
2. Open the Manager
Run the High Availability manager to initialize your configuration:
/ha
3. Configure Your First Group
- Select ๐ Groups.
- Add or select a group (e.g.,
default). - Add Model IDs (e.g.,
anthropic/claude-3-5-sonnet) to the group. - Use
uanddkeys to rearrange the priority.
๐ฎ The HA Manager (/ha)
The interactive manager is your control center for high availability.
Keyboard Navigation
| Key | Action |
|---|---|
โ / โ |
Navigate items |
Space / โ |
Expand/collapse section or toggle item |
Enter |
Select/activate item |
x / d / Delete |
Delete currently selected item (with confirmation) |
u |
Move item up (reorder) |
d |
Move item down (reorder) |
Esc |
Cancel / Exit |
๐ Group Management
- Add/Rename/Delete groups.
- Rearrange Priority: Use
u(up) andd(down) keys to set the failover order of models within a group. - Per-Entry Cooldown: Set custom recovery times for specific models.
- Delete Models: Navigate to any model entry and press
xto remove it from the group.
๐ Credential Management
- Auto-Sync: Credentials from
/loginare automatically synced when you open/ha. - Add API Providers: Use "+ Add API Provider" to manually add providers that use API keys.
- Add API Keys: For non-OAuth providers, add additional API keys as backups.
- Account Priority: Use
uanddkeys to decide which account isprimaryand which arebackup-1,backup-2, etc. - Delete Keys: Navigate to any key entry and press
xto delete it. - Delete Providers: Navigate to a provider header (e.g.,
๐ google-gemini-cli) and pressxto delete the entire provider and all its keys.
โฑ๏ธ Settings
- Default Cooldown: Set the default recovery time (e.g., 3600000ms for 1 hour) for exhausted providers.
- Default Group: Choose which failover chain Pi uses when it starts up.
- Error Handling: Configure how different error types are handled:
- Capacity Error Action: What to do when a provider reports "out of capacity" (doesn't help to switch accounts for the same provider)
- Quota Error Action: What to do when a provider reports quota/rate limit exceeded (switching accounts may help)
- Retry Timeout: How long to wait before retrying when using "retry" action (default: 300000ms = 5 minutes)
๐ How Failover Works
The Failover Chain
When a quota or capacity error is detected:
- Try Next Account: The extension looks for another credential for the same provider (e.g., your second Google account).
- Mark Exhausted: The current account is marked as exhausted and won't be used again until its cooldown expires.
- Switch Provider: If all accounts for that provider are exhausted, the extension looks at the Active Group and switches to the next provider/model in the list.
- Automatic Retry: Pi automatically resends your last message using the new provider and primary account, making the transition transparent.
Error Detection
The extension detects:
- Quota Errors: HTTP 429, "rate limit", "insufficient quota", etc.
- Capacity Errors: "No capacity available", "Engine Overloaded", etc.
- Gemini Awareness: Correctly waits for Google's internal retry attempts before triggering a failover.
โ๏ธ Configuration Guide (ha.json)
While you should use the /ha UI, you can also manually edit ~/.pi/agent/ha.json:
{
"groups": {
"pro": {
"name": "Professional Tier",
"entries": [
{ "id": "anthropic/claude-3-5-sonnet" },
{ "id": "google-gemini-cli/gemini-1.5-pro", "cooldownMs": 1800000 }
]
}
},
"defaultGroup": "pro",
"defaultCooldownMs": 3600000,
"errorHandling": {
"capacityErrorAction": "next_provider",
"quotaErrorAction": "next_key_then_provider",
"retryTimeoutMs": 300000
},
"credentials": {
"anthropic": {
"primary": { "type": "oauth", "refresh": "...", "access": "..." },
"backup-1": { "type": "api_key", "key": "..." }
}
}
}
Error Handling Configuration
The errorHandling section in ha.json lets you customize how the extension responds to different error types:
| Setting | Description | Default |
|---|---|---|
capacityErrorAction |
Action when provider has no capacity (affects all accounts) | next_key_then_provider |
quotaErrorAction |
Action when account hits rate limit (may not affect other accounts) | next_key_then_provider |
retryTimeoutMs |
How long to wait before retrying (in milliseconds) | 300000 (5 minutes) |
Understanding the Error Types
Capacity Errors occur when a provider's servers are overloaded. Examples:
- "No capacity available for this model"
- "Engine overloaded"
- "Service temporarily unavailable"
These errors affect the provider's infrastructure, so switching to a different account for the same provider typically won't help. Recommended action: next_provider or retry.
Quota Errors occur when an account exceeds its limits. Examples:
- "Rate limit exceeded (429)"
- "Insufficient quota"
- "Daily limit reached"
These errors are per-account, so switching to another OAuth entry or API key for the same provider may solve the problem. Recommended action: next_key_then_provider (default) or next_provider if you don't have backup accounts.
Available Actions
The following actions can be configured for both capacityErrorAction and quotaErrorAction:
| Action | Description |
|---|---|
stop |
Stop the process and display the error (default if pi-high-availability is not installed) |
retry |
Wait for retryTimeoutMs milliseconds, then retry the same request |
next_provider |
Immediately switch to the next provider in the current group |
next_key_then_provider |
Try the next account/key for the current provider, then move to next provider if all exhausted (default) |
Note: For capacity errors, next_key_then_provider is often not helpful since all accounts for the same provider typically share the same capacity pool. Use next_provider or retry for capacity errors instead.
๐ License
MIT
๐ค Contributing
Contributions welcome! Please open an issue or PR on GitHub.
๐ Credits
Built for the pi coding agent community.