pi-minimodel-ocr

Pi extension: Local OCR via Ollama GLM-OCR (0.9B) — convert images and PDFs to Markdown / LaTeX with high math formula accuracy. Bridges the multimodal gap for non-vision LLMs like DeepSeek.

Packages

Package details

extension

Install pi-minimodel-ocr from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-minimodel-ocr

Package: pi-minimodel-ocr
Version: 0.2.1
Published: May 26, 2026
Downloads: not available
Author: astronaut_jack
License: MIT
Types: extension
Size: 27.7 KB
Dependencies: 0 dependencies · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-minimodel-ocr

Local OCR for Pi Coding Agent — extract text, LaTeX math formulas, and tables from images and PDFs using small vision models via Ollama.

Bridges the multimodal gap for non-vision LLMs like DeepSeek. When your model can't see images, minimodel_ocr acts as its eyes — with state-of-the-art formula recognition outputting LaTeX.

Features


🔤 Text	General text recognition → Markdown
🧮 Formulas	Math formulas → LaTeX with high accuracy
📊 Tables	Table structure → Markdown tables
🖼️ Figures	Diagrams and illustrations → descriptions
📄 PDF	Full PDF support with per-page conversion (macOS / Linux / WSL)
🎛️ Any model	Defaults to `glm-ocr` (0.9B) but works with any Ollama vision model
🔒 100% local	No API keys, no cloud, no data ever leaves your machine

Quickstart

1. Prerequisites

# Install Ollama
brew install ollama                     # macOS
curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull the default OCR model (~2.2 GB)
ollama pull glm-ocr

# Linux/WSL PDF support
sudo apt install poppler-utils

macOS uses built-in sips for single-page PDFs — zero extra deps.
Multi-page PDFs: brew install poppler (macOS) or apt install poppler-utils (Linux).

2. Install

pi install npm:pi-minimodel-ocr

Or try it without installing:

pi -e npm:pi-minimodel-ocr

Usage

LLM-invoked (automatic)

The extension registers a minimodel_ocr tool. The agent invokes it automatically when it needs to read an image or PDF. Just ask:

> What formula is written in this screenshot?

The model calls minimodel_ocr with task="formula" and gets back LaTeX. Works the same way for text, tables, figures, or full documents.

Command-line (manual)

/ocr <file> [task] [model]

Example	Result
`/ocr ./scan.png`	Auto-detect all content
`/ocr ./equation.jpg formula`	LaTeX formula output
`/ocr ./receipt.pdf text`	Text-only extraction
`/ocr ./table.png table`	Markdown table
`/ocr ./paper.pdf auto llama3.2-vision`	Use a different model

Tasks

Task	Description	Output format
`auto`	Full document OCR (default)	Markdown + LaTeX mixed
`text`	Plain text recognition	Markdown
`formula`	Math formula recognition	LaTeX
`table`	Table structure recognition	Markdown tables
`figure`	Figure / diagram description	Natural language

Supported Models

Defaults to glm-ocr (Zhipu AI, 0.9B, 94.62 OmniDocBench) — the best open-source small OCR model. Works with any Ollama vision model:

# Smaller quantized variant (~1.6 GB)
/ocr ./img.png auto glm-ocr:q8_0

# Or any vision model you have pulled
/ocr ./doc.pdf auto llama3.2-vision
/ocr ./chart.png figure minicpm-v

Set a custom default via environment variable:

export OCR_MODEL="glm-ocr:q8_0"

PDF Support

Platform	Single-page	Multi-page
macOS	`sips` (built-in, zero-deps)	`brew install poppler`
Linux / WSL	`pdftoppm` (poppler-utils)	`pdftoppm` (poppler-utils)

The extension auto-detects multi-page PDFs and shows install instructions if the required tools are missing — it won't silently drop pages.

Configuration

Environment variables

export OLLAMA_HOST="http://localhost:11434"   # default
export OCR_MODEL="glm-ocr"                    # default model

settings.json

{
  "minimodelOcr": {
    "ollamaHost": "http://localhost:11434",
    "model": "glm-ocr"
  }
}

How It Works

┌──────────────────┐     ┌──────────────────┐     ┌─────────────────────┐
│  pi (DeepSeek)   │────▶│  minimodel_ocr   │────▶│  Ollama Server      │
│  (no vision)     │     │  pi extension    │     │  (any vision model) │
└──────────────────┘     └──────────────────┘     └─────────────────────┘
        │                         │                           │
        │  "read this image"      │  POST /api/generate       │
        │────────────────────────▶│  base64 image + prompt    │
        │                         │──────────────────────────▶│
        │                         │  OCR text response        │
        │  LaTeX / Markdown       │◀──────────────────────────│
        │◀────────────────────────│                           │

For PDFs, the extension converts each page to PNG using sips (macOS) or pdftoppm (Linux) before sending to Ollama.

License

MIT