pi-ollama-api

Ollama Cloud provider extension for Pi — connect to Ollama Cloud models via the OpenAI-compatible API

Packages

Package details

extension

Install pi-ollama-api from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-ollama-api

Package: pi-ollama-api
Version: 1.3.0
Published: May 30, 2026
Downloads: 95/mo · 14/wk
Author: mercuriusdream
License: MIT
Types: extension
Size: 66.6 KB
Dependencies: 0 dependencies · 0 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-ollama-api

Ollama Cloud provider extension for Pi — connect your terminal coding agent to 200+ models on Ollama Cloud via the OpenAI-compatible API.

Features

Native Ollama API discovery — Queries /api/tags and /api/show for real model metadata (context windows, capabilities, parameter sizes, quantization)
Actual context windows — No hardcoded defaults. Every model reports its real context length from Ollama's API (e.g., 1M for DeepSeek V4, 262K for Kimi K2, 128K for GPT-OSS)
Capability detection — Vision, reasoning, tools detected from Ollama's capabilities array
OpenAI-compatible API — Uses openai-completions streaming (works with all Pi features)
Embeddings tool — Generate embeddings via /v1/embeddings for RAG and similarity search
Direct chat tool — Send one-off completions for model comparison or testing

Supported Model Families

Family	Models	Highlights
Llama	3.3, 3.2, 3.1, 3, 2	70B frontier, Vision variants, 405B
Qwen	3, 2.5, 2, VL, Coder, Math	128K context, Vision, Code, Math variants
DeepSeek	R1, V3, V2, Coder V2	Reasoning (R1), 671B total
Mistral	Codestral, Mistral, Nemo, Large, Mixtral	256K context Codestral
Gemma	3, 2, CodeGemma, ShieldGemma	Vision support, 128K context
Phi	4, 3.5, 3	Microsoft models, 128K context
IBM	Granite 3.x, Granite Code	MoE variants, 128K context
Cohere	Command R, Aya, Aya Expanse	Multilingual, 128K context
GPT-OSS	120B, 20B (Cloud)	Cloud-hosted OSS models
+ 30+ more	Yi, Falcon, GLM, InternLM, SOLAR, etc.	See full list in source

Installation

# Install via pi
pi install npm:pi-ollama-api

# Or install locally
pi install npm:pi-ollama-api -l

Setup

Get an API key from ollama.com/settings
Start Pi and run:
```
/ollama-cloud-login
```
Paste your API key when prompted. It is stored in Pi's ~/.pi/agent/auth.json (same place as /login credentials).
Select a model with /model → pick any ollama-cloud/* model

Authentication

Method	How	Where stored
Pi /login (recommended)	Run `/login` in Pi → select "Use an API key"	`~/.pi/agent/auth.json`
Environment variable	`export OLLAMA_API_KEY=...`	Shell env

Pi's AuthStorage is used natively — API keys are checked in auth.json first, then the env var is used as a fallback.

Environment Variables

Variable	Default	Description
`OLLAMA_API_KEY`	—	Fallback API key (used if auth.json has no key)
`OLLAMA_CLOUD_BASE_URL`	`https://ollama.com/v1`	Override endpoint (for proxies or self-hosted)
`OLLAMA_CLOUD_MODELS`	—	Comma-separated list to skip discovery and use static models
`OLLAMA_CLOUD_TIMEOUT`	`30000`	Model discovery timeout in ms

Usage

Select a model

/model

Then pick any ollama-cloud/* model. Examples:

ollama-cloud/llama3.3 — Llama 3.3 70B
ollama-cloud/qwen3 — Qwen 3 with vision
ollama-cloud/deepseek-r1 — DeepSeek R1 with reasoning
ollama-cloud/gemma3:27b — Gemma 3 27B with vision

Commands

Command	Description
`/ollama-cloud-status`	Check API key status and model count
`/ollama-cloud-refresh`	Re-fetch live model list from Ollama Cloud API
`/ollama-cloud-list`	Pretty-print all models with 🧠/🖼️/💬 badges
`/ollama-cloud-pull <id>`	Show the `ollama pull` command for a model

Tools (LLM-callable)

Tool	Purpose
`ollama_list_models`	Filter models by family, vision, or reasoning
`ollama_embeddings`	Generate embeddings via `/v1/embeddings`
`ollama_chat`	Direct chat completion via `/v1/chat/completions`
`ollama_model_info`	Get detailed metadata for a specific model

Quick Examples

# Check what models are available
Use ollama_list_models to show all available models

# Get embeddings for a document
Use ollama_embeddings with model "nomic-embed-text" and input "The quick brown fox"

# Compare model outputs
Use ollama_chat with model "llama3.3" and messages [{role: "user", content: "Hello"}]
Use ollama_chat with model "qwen3" and messages [{role: "user", content: "Hello"}]

API Compatibility

This extension uses Ollama's OpenAI-compatible API (/v1/chat/completions), which supports:

Chat completions with streaming
Vision (multimodal) inputs
Tool calling
JSON mode
Reasoning/thinking control
Embeddings (/v1/embeddings)

License

MIT