@0xkobold/pi-ollama

Ollama extension for pi-coding-agent. Unified local + cloud Ollama support with model management

Packages

Package details

extension

Install @0xkobold/pi-ollama from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@0xkobold/pi-ollama
Package
@0xkobold/pi-ollama
Version
0.5.0
Published
May 13, 2026
Downloads
1,167/mo · 267/wk
Author
moikapy
License
MIT
Types
extension
Size
81 KB
Dependencies
0 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./dist/index.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

Pi Ollama Extension

Ollama integration for pi-coding-agent with accurate model details from /api/show.

Changelog

v0.5.0

  • Fix: DeepSeek models now return correct context lengths — deepseek-v4 → 1M tokens, other deepseek → 163,840 tokens (was 4,096). Closes #4.
  • Fix: Default context fallback raised from 4,096 → 131,072 (128k). Per Ollama docs, cloud models default to their maximum context; 128k is a conservative floor for unknowns.
  • Fix: hasReasoningCapability() no longer flags instruct, chat, or code/coder models as reasoning. Only models with actual thinking capability (DeepSeek, R1, QwQ, GPT-OSS, Phi) or a thinking/reason capability from /api/show are marked as reasoning. See Ollama thinking docs.
  • Fix: createModel() now sets maxTokens to min(contextWindow, 16384) instead of hardcoded 8192. chat()/chatStream() no longer send max_tokens: 4096 by default — omitted unless explicitly set.
  • Fix: Cloud model deduplication now properly strips :cloud suffix before comparing model names.
  • Fix: loadConfigFromSettingsFiles() now uses async dynamic import() instead of require(), fixing ESM compatibility.
  • New: hasReasoningCapability() accepts optional modelInfo parameter and checks capabilities array for thinking/reason.
  • New: Error classification — chat() and chatStream() now throw typed errors: OllamaAuthError (401/403), OllamaRateLimitError (429), OllamaModelError (400/404), OllamaServerError (500/502).
  • New: Request timeout — chat() and chatStream() now apply a 120s timeout via AbortController.
  • New: Exported constants DEFAULT_CONTEXT_LENGTH (131072), DEFAULT_MAX_TOKENS (8192), DEFAULT_REQUEST_TIMEOUT_MS (120000).
  • New: stripProviderPrefix() now exported and tested.
  • Docs: Comprehensive JSDoc comments referencing Ollama API docs.
  • Docs: README updated with all context length tables, error types, and API reference.

v0.4.1

  • Fix: Cloud models now correctly use /v1 endpoint. Previously, ollama-cloud was registered with baseUrl: "https://ollama.com", causing pi to hit https://ollama.com/chat/completions (HTML homepage) instead of https://ollama.com/v1/chat/completions.
  • Fix: Trailing slashes in cloudUrl config are now properly stripped before appending /v1.

Installation

# Via pi CLI
pi install npm:@0xkobold/pi-ollama

# Or in pi-config.ts
{
  extensions: [
    'npm:@0xkobold/pi-ollama'
  ]
}

# Or temporary (testing)
pi -e npm:@0xkobold/pi-ollama

Features

  • 🦙 Local Ollama — Connect to localhost:11434
  • ☁️ Ollama Cloud — Use ollama.com with API key
  • 📊 Accurate Details — Uses /api/show for real context length
  • 👁️ Vision Detection — Detects vision from capabilities array
  • 🧠 Reasoning Detection — Detects thinking models from capabilities and name patterns
  • 🔍 Model Info — Query specific model parameters
  • 🛡️ Error Classification — Typed errors for auth, rate limits, model errors, server errors
  • ⏱️ Request Timeouts — 120s default timeout on all HTTP calls

Commands

Command Description
/ollama-status Check connection status
/ollama-models List models with context length
/ollama-info MODEL Show model details from /api/show
/ollama status|info|models Shortcuts

How It Works

The extension uses Ollama's /api/show endpoint to get accurate model information:

curl http://localhost:11434/api/show -d '{
  "model": "gemma3",
  "verbose": true
}'

Response includes (per Ollama docs):

  • model_info.<arch>.context_length — Accurate context window
  • capabilities["completion", "vision", "thinking"]
  • details.parameter_size — "4.3B", "70B", etc.
  • details.family — "gemma3", "llama", etc.

Context Length Resolution

Context length is resolved in this order:

  1. model_info.*.context_length — From /api/show (most accurate)
  2. Top-level keyscontext_length, max_position_embeddings, max_sequence_length, n_ctx
  3. Parameter-size heuristic — Small models → smaller context
  4. Name-based lookup — For cloud models without /api/show
  5. Default fallback — 131,072 (128k tokens)

Name-Based Context Length Table

Model Family Context Length Source
deepseek-v4 1,048,576 (1M) Ollama library
kimi 262,144 (256k) Ollama library
qwen3 262,144 (256k) Ollama library
minimax 204,800 (200k) Ollama library
glm 202,752 (~198k) Ollama library
llama3.1/3.2/3.3 128,000 (128k) Ollama library
deepseek (non-v4) 163,840 (160k) Ollama library
gpt-oss 128,000 (128k) Ollama library
qwen/qwen2.5 32,768 (32k) Ollama library
mistral/mixtral 32,768 (32k) Ollama library
llama3 8,192 Ollama library
Unknown 131,072 (128k) Conservative default per Ollama context docs

Reasoning Capability Detection

Per Ollama thinking docs, reasoning/thinking is detected by:

  1. capabilities array from /api/show — if it includes "thinking" or "reason"
  2. Name-based heuristic (for cloud models):
    • ✅ DeepSeek models (have think mode)
    • r1 models (word boundary match)
    • ✅ QwQ, GPT-OSS, Phi
    • ✅ Models containing "reason"
    • instruct, chat, code/coder — these are format tags, NOT reasoning

Error Handling

All HTTP calls classify errors into typed classes:

Class Status Codes Meaning
OllamaAuthError 401, 403 Invalid API key
OllamaRateLimitError 429 Rate limit exceeded
OllamaModelError 400, 404 Bad request or model not found
OllamaServerError 500, 502 Server/gateway error
OllamaError Other Catch-all

Per Ollama error docs.

Configuration

Configuration is loaded with the following precedence (highest to lowest):

  1. Environment variables (override everything)
  2. pi.settings (runtime API, when available)
  3. .pi/settings.json (project-local settings)
  4. ~/.pi/agent/settings.json (global user settings)

Environment Variables

export OLLAMA_HOST="http://localhost:11434"       # Local base URL
export OLLAMA_HOST_CLOUD="https://ollama.com"     # Cloud base URL
export OLLAMA_API_KEY="your-api-key"              # Cloud API key

Settings File

Add to your global settings (~/.pi/agent/settings.json):

{
  "ollama": {
    "baseUrl": "http://localhost:11434",
    "cloudUrl": "https://ollama.com",
    "apiKey": "your-ollama-cloud-api-key"
  }
}

Per Ollama cloud docs.

API Reference

import {
  fetchModelDetails,
  getContextLength,
  hasVisionCapability,
  hasReasoningCapability,
  createClients,
  classifyHttpError,
  OllamaError,
  OllamaAuthError,
  OllamaRateLimitError,
  OllamaModelError,
  OllamaServerError,
  DEFAULT_CONTEXT_LENGTH,
  DEFAULT_MAX_TOKENS,
} from '@0xkobold/pi-ollama/shared';

// Get model details from local Ollama
const details = await fetchModelDetails(client, 'gemma3');

// Extract context length (with name-based fallback)
const ctx = getContextLength(details, 'gemma3');  // 131072

// Check capabilities
const hasVision = hasVisionCapability(details);           // true/false
const hasReasoning = hasReasoningCapability('deepseek-r1', details);  // true

// Classify HTTP errors
try {
  await chat(client, { model: 'gemma3', messages: [...] });
} catch (err) {
  if (err instanceof OllamaRateLimitError) {
    // Handle rate limit
  } else if (err instanceof OllamaAuthError) {
    // Handle auth failure
  }
}

License

MIT © 0xKobold