pi-ollama-api
Ollama Cloud provider extension for Pi — connect to Ollama Cloud models via the OpenAI-compatible API
Package details
Install pi-ollama-api from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-ollama-api- Package
pi-ollama-api- Version
1.3.0- Published
- May 30, 2026
- Downloads
- not available
- Author
- mercuriusdream
- License
- MIT
- Types
- extension
- Size
- 66.6 KB
- Dependencies
- 0 dependencies · 0 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-ollama-api
Ollama Cloud provider extension for Pi — connect your terminal coding agent to 200+ models on Ollama Cloud via the OpenAI-compatible API.
Features
- Native Ollama API discovery — Queries
/api/tagsand/api/showfor real model metadata (context windows, capabilities, parameter sizes, quantization) - Actual context windows — No hardcoded defaults. Every model reports its real context length from Ollama's API (e.g., 1M for DeepSeek V4, 262K for Kimi K2, 128K for GPT-OSS)
- Capability detection — Vision, reasoning, tools detected from Ollama's
capabilitiesarray - OpenAI-compatible API — Uses
openai-completionsstreaming (works with all Pi features) - Embeddings tool — Generate embeddings via
/v1/embeddingsfor RAG and similarity search - Direct chat tool — Send one-off completions for model comparison or testing
Supported Model Families
| Family | Models | Highlights |
|---|---|---|
| Llama | 3.3, 3.2, 3.1, 3, 2 | 70B frontier, Vision variants, 405B |
| Qwen | 3, 2.5, 2, VL, Coder, Math | 128K context, Vision, Code, Math variants |
| DeepSeek | R1, V3, V2, Coder V2 | Reasoning (R1), 671B total |
| Mistral | Codestral, Mistral, Nemo, Large, Mixtral | 256K context Codestral |
| Gemma | 3, 2, CodeGemma, ShieldGemma | Vision support, 128K context |
| Phi | 4, 3.5, 3 | Microsoft models, 128K context |
| IBM | Granite 3.x, Granite Code | MoE variants, 128K context |
| Cohere | Command R, Aya, Aya Expanse | Multilingual, 128K context |
| GPT-OSS | 120B, 20B (Cloud) | Cloud-hosted OSS models |
| + 30+ more | Yi, Falcon, GLM, InternLM, SOLAR, etc. | See full list in source |
Installation
# Install via pi
pi install npm:pi-ollama-api
# Or install locally
pi install npm:pi-ollama-api -l
Setup
- Get an API key from ollama.com/settings
- Start Pi and run:
Paste your API key when prompted. It is stored in Pi's/ollama-cloud-login~/.pi/agent/auth.json(same place as/logincredentials). - Select a model with
/model→ pick anyollama-cloud/*model
Authentication
| Method | How | Where stored |
|---|---|---|
| Pi /login (recommended) | Run /login in Pi → select "Use an API key" |
~/.pi/agent/auth.json |
| Environment variable | export OLLAMA_API_KEY=... |
Shell env |
Pi's AuthStorage is used natively — API keys are checked in auth.json first, then the env var is used as a fallback.
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_API_KEY |
— | Fallback API key (used if auth.json has no key) |
OLLAMA_CLOUD_BASE_URL |
https://ollama.com/v1 |
Override endpoint (for proxies or self-hosted) |
OLLAMA_CLOUD_MODELS |
— | Comma-separated list to skip discovery and use static models |
OLLAMA_CLOUD_TIMEOUT |
30000 |
Model discovery timeout in ms |
Usage
Select a model
/model
Then pick any ollama-cloud/* model. Examples:
ollama-cloud/llama3.3— Llama 3.3 70Bollama-cloud/qwen3— Qwen 3 with visionollama-cloud/deepseek-r1— DeepSeek R1 with reasoningollama-cloud/gemma3:27b— Gemma 3 27B with vision
Commands
| Command | Description |
|---|---|
/ollama-cloud-status |
Check API key status and model count |
/ollama-cloud-refresh |
Re-fetch live model list from Ollama Cloud API |
/ollama-cloud-list |
Pretty-print all models with 🧠/🖼️/💬 badges |
/ollama-cloud-pull <id> |
Show the ollama pull command for a model |
Tools (LLM-callable)
| Tool | Purpose |
|---|---|
ollama_list_models |
Filter models by family, vision, or reasoning |
ollama_embeddings |
Generate embeddings via /v1/embeddings |
ollama_chat |
Direct chat completion via /v1/chat/completions |
ollama_model_info |
Get detailed metadata for a specific model |
Quick Examples
# Check what models are available
Use ollama_list_models to show all available models
# Get embeddings for a document
Use ollama_embeddings with model "nomic-embed-text" and input "The quick brown fox"
# Compare model outputs
Use ollama_chat with model "llama3.3" and messages [{role: "user", content: "Hello"}]
Use ollama_chat with model "qwen3" and messages [{role: "user", content: "Hello"}]
API Compatibility
This extension uses Ollama's OpenAI-compatible API (/v1/chat/completions), which supports:
- Chat completions with streaming
- Vision (multimodal) inputs
- Tool calling
- JSON mode
- Reasoning/thinking control
- Embeddings (
/v1/embeddings)
License
MIT