pi-ollama-cloud
Ollama Cloud provider plugin for [Pi](https://github.com/badlogic/pi-mono) coding agent.
Package details
Install pi-ollama-cloud from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-ollama-cloud- Package
pi-ollama-cloud- Version
0.3.1- Published
- May 5, 2026
- Downloads
- 636/mo · 426/wk
- Author
- fgrehm
- License
- unknown
- Types
- extension
- Size
- 31.8 KB
- Dependencies
- 0 dependencies · 3 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-ollama-cloud
Ollama Cloud provider plugin for Pi coding agent.
Registers Ollama Cloud as a model provider with dynamically fetched models, and provides ollama_web_search and ollama_web_fetch tools that use the Ollama Cloud web search API — no local Ollama server required.
Features
- Dynamic model discovery - Fetches the full model list from
ollama.com/v1/models, then fetches per-model details via/api/showto determine capabilities, context length, and tool support. - Persistent cache - Raw API responses are cached at
~/.pi/agent/cache/ollama-cloud-models.jsonso models are available immediately on startup without hitting the network. - Cold cache fallback - When no cache exists, a small set of hardcoded models is used until
/ollama-cloud-refreshis run. /ollama-cloud-refreshcommand - Re-fetches the model list from the API and updates the cache and provider registration live (no restart needed).ollama_web_searchtool - Search the web for real-time information using Ollama Cloud's/api/web_searchendpoint. Returns titles, URLs, and content snippets.ollama_web_fetchtool - Fetch and extract text content from a web page URL using Ollama Cloud's/api/web_fetchendpoint. Returns page title, content, and links.- Zero cost tracking - All models are registered with zero costs since Ollama Cloud uses a flat subscription model (Free, Pro, Max) rather than per-token billing. Per-request costs don't apply, so Pi's cost tracker always shows zero. See ollama.com/pricing for plan details.
Prerequisites
Installation
Option 1: from npm (recommended)
pi install npm:pi-ollama-cloud
This installs the latest published version from npm. Run pi update to get new versions.
Option 2: from git
pi install git:github.com/fgrehm/pi-ollama-cloud
This clones the repo to ~/.pi/agent/git/ and adds it to your settings.
For project-local install (stored in .pi/git/):
pi install git:github.com/fgrehm/pi-ollama-cloud --local
Option 3: -e flag (try without installing)
pi -e npm:pi-ollama-cloud
Option 4: Clone manually (if you want to make changes and "try it live")
Pi auto-discovers subdirectories under ~/.pi/agent/extensions/:
git clone git@github.com:fgrehm/pi-ollama-cloud.git ~/.pi/agent/extensions/pi-ollama-cloud
Setup
1. Get an API key
Sign up at ollama.com and generate an API key.
2. Configure the API key
Either set the OLLAMA_API_KEY environment variable:
export OLLAMA_API_KEY="your-key"
Or add it to ~/.pi/agent/auth.json:
{
"ollama-cloud": {
"type": "api_key",
"key": "your-key"
}
}
3. Disable web tools (optional)
If you want to use a different web search or fetch tool (e.g. Brave) by default, and need to avoid conflicts with the built-in Ollama Cloud tools, set the PI_OLLAMA_WEB_TOOLS environment variable to any falsy value:
export PI_OLLAMA_WEB_TOOLS=0
Accepted disabling values are 0, false, no, off, or an empty string. When disabled, ollama_web_search and ollama_web_fetch are not registered. The model provider and /ollama-cloud-refresh command remain active regardless.
4. Fetch models
On first launch the plugin will use a small set of fallback models. Run:
/ollama-cloud-refresh
This fetches the full model list from the Ollama Cloud API and caches it locally.
5. Select a model
Use /model or Ctrl+L to switch to an Ollama Cloud model. Models appear under the ollama-cloud provider.
How it works
The plugin uses two Ollama Cloud API endpoints to build the model list:
GET https://ollama.com/v1/models- Returns a list of all available model IDs.POST https://ollama.com/api/show- For each model, fetches details including capabilities (tools,thinking,vision) and context length.
Only models with the tools capability are registered - these are the ones Pi can use for tool-calling.
The raw /api/show responses are cached at ~/.pi/agent/cache/ollama-cloud-models.json. This cache never expires - run /ollama-cloud-refresh to update it.
Model metadata is derived from the cached data:
| Field | Source |
|---|---|
reasoning |
capabilities includes "thinking" |
input |
["text", "image"] if capabilities includes "vision", else ["text"] |
contextWindow |
model_info.*.context_length (falls back to 128000) |
maxTokens |
Fixed at 32768 |
cost |
All zeros (Ollama Cloud uses subscription plans, not per-token billing - see pricing) |
Tools
| Tool | Description |
|---|---|
ollama_web_search |
Search the web via Ollama Cloud's /api/web_search |
ollama_web_fetch |
Fetch a web page via Ollama Cloud's /api/web_fetch |
Both tools use the same Ollama Cloud API key configured for the provider. No local Ollama server is needed.
Commands
| Command | Description |
|---|---|
/ollama-cloud-refresh |
Fetch models from the Ollama Cloud API, update cache, and re-register the provider |
Development
npm install # install devDependencies (biome)
npm run check # lint + format with auto-fix
npm run lint # lint only (no fixes)
npm run format # format only
The project uses Biome for linting and formatting (2-space indent, line width 120).
How is this different from ollama launch pi?
ollama launch pi is Ollama's built-in one-command setup that configures Pi to talk to your local Ollama server. Both local and cloud models work - cloud models (e.g. qwen3.5:cloud) are proxied through your local server to ollama.com. This extension takes a different approach: it connects Pi directly to Ollama's hosted API at ollama.com, bypassing the local server entirely.
ollama launch pi |
pi-ollama-cloud |
|
|---|---|---|
| Provider name | ollama |
ollama-cloud |
| Endpoint | Local Ollama server (http://localhost:11434/v1) |
Ollama Cloud (https://ollama.com/v1) |
| Local models | ✅ Run on your machine | ❌ Not available |
| Cloud models | ✅ Proxied through local server (e.g. qwen3.5:cloud) |
✅ Connected directly |
| Local Ollama required? | Yes - must be installed and running | No - works without any local server |
| Authentication | Handled by the local server (sign-in flow via ollama) |
Ollama Cloud API key (set via OLLAMA_API_KEY or auth.json) |
| Model discovery | Interactive picker with curated recommendations + pulled models | Dynamic - fetches all available cloud models with tool support from the API |
| Web tools | Auto-installed (@ollama/pi-web-search) when cloud is enabled |
✅ Built-in: ollama_web_search and ollama_web_fetch use the Ollama Cloud web search API directly (same API key, no local server needed) |
| Setup effort | One command: ollama launch pi |
Install extension + API key + /ollama-cloud-refresh |
| Use when | You're already running Ollama locally and want the default experience | You don't want to run a local server, or want a standalone cloud-only provider alongside your local setup |
You can use both at the same time. The providers live under different names (ollama vs ollama-cloud), so you can switch between them with /model or Ctrl+L. For example, use your local ollama provider for low-latency work on smaller models, and ollama-cloud for direct access to the full catalog of cloud models without needing a local server.
Note: The
@ollama/pi-web-searchpackage (installed automatically byollama launch pi) calls the local Ollama server's/api/experimental/web_searchand/api/experimental/web_fetchendpoints and authenticates viaollama signin. This extension'sollama_web_searchandollama_web_fetchtools use the cloud API atollama.com/api/web_searchandollama.com/api/web_fetchinstead — same API key, no local server required. Both can coexist: the local tools register asweb_search/web_fetchand these register asollama_web_search/ollama_web_fetchto avoid name conflicts.
Releasing
Publishing a new version to npm is a two-command process:
# 1. Bump version and create a git tag in one step
npm version minor # or patch, or major
# 2. Push the tag to trigger the GitHub Actions publish workflow
git push --tags
The tag version must match the version in package.json — npm version handles this automatically. The workflow at .github/workflows/publish.yml verifies the match before publishing to npm.
The workflow uses npm's trusted publishing (OIDC) — no tokens stored as secrets. To set it up:
- Go to npmjs.com → your avatar → Packages →
pi-ollama-cloud→ Settings → Trusted publishing - Click GitHub Actions and enter:
- Workflow filename:
publish.yml
- Workflow filename:
- Save
Each publish also gets automatic provenance attestation.
Notes
- Some Ollama Cloud models may reject the
developermessage role, causing a400error. If you encounter this, the model may needcompat: { supportsDeveloperRole: false }. You can editindex.tsto add this for specific models, or open an issue to track it. - The fetch timeout is 10 seconds per request. On slow connections, some model detail fetches may time out - the plugin reports how many succeeded vs failed.