pi-glm-ocr

Pi extension: Local OCR via Ollama GLM-OCR (0.9B) — convert images and PDFs to Markdown / LaTeX with high math formula accuracy. Bridges the multimodal gap for non-vision LLMs like DeepSeek.

Packages

Package details

extension

Install pi-glm-ocr from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-glm-ocr

Package: pi-glm-ocr
Version: 0.1.1
Published: May 24, 2026
Downloads: not available
Author: astronaut_jack
License: MIT
Types: extension
Size: 20.3 KB
Dependencies: 0 dependencies · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-glm-ocr

Local OCR for Pi Coding Agent — extract text, LaTeX math formulas, and tables from images and PDFs using GLM-OCR (0.9B) via Ollama.

Bridges the multimodal gap for non-vision LLMs like DeepSeek. When your model can't see images or PDFs, pi-glm-ocr acts as its eyes — with high-accuracy formula recognition outputting LaTeX.

Features

🔍 Text Recognition — Extracts text as Markdown from images and PDFs
🧮 Formula Recognition — Math formulas output in LaTeX with high accuracy
📊 Table Recognition — Tables extracted as Markdown tables
🖼️ Figure Description — Describes figures and diagrams
📄 PDF Support — Converts PDF pages to images automatically (macOS/Linux)
📦 Fully local — No API keys, no cloud, no data leaves your machine

Prerequisites

Install Ollama (if not already):

# macOS
brew install ollama
# or download from https://ollama.com/download

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Pull the GLM-OCR model:
```
ollama pull glm-ocr
```
Model size: ~2.2 GB (bf16) or ~1.6 GB (q8_0 variant)
For PDF support on Linux:
```
sudo apt install poppler-utils
```
macOS uses built-in CoreGraphics — no extra dependencies needed.

Install

pi install npm:pi-glm-ocr

Or try it without installing:

pi -e npm:pi-glm-ocr

Usage

As a tool (LLM-invoked)

The extension registers a glm_ocr tool that the agent can call automatically. Just ask pi:

> What's the formula in this screenshot?
(attach image or mention path)

# The model will call glm_ocr with task="formula" and read the LaTeX

> Extract all text from paper.pdf
# Model calls glm_ocr with task="auto" and gets back Markdown + LaTeX

As a command (user-invoked)

/glm-ocr ./screenshot.png formula
/glm-ocr ./screenshot.png auto glm-ocr:q8_0
/glm-ocr ./document.pdf auto
/glm-ocr ./table.png table
/glm-ocr ./diagram.png figure
/glm-ocr ./page.jpg text

Tasks

Task	Prompt	Output
`text`	Text Recognition	Markdown
`formula`	Formula Recognition	LaTeX
`table`	Table Recognition	Markdown tables
`figure`	Figure Recognition	Description
`auto`	Full document OCR	Markdown + LaTeX (mixed)

Configuration

Model Selection

You can override the model per-call:

# Via command - 3rd argument is the model
/glm-ocr ./image.png formula glm-ocr:q8_0
/glm-ocr ./paper.pdf auto llama3.2-vision

# Via LLM - the agent can specify the model parameter
# Example: "use glm_ocr with model='minicpm-v' to read this image"

Environment Variables

export OLLAMA_HOST="http://localhost:11434"  # default
export GLM_OCR_MODEL="glm-ocr"                # default model

Or in ~/.pi/agent/settings.json:

{
  "glmOcr": {
    "ollamaHost": "http://localhost:11434",
    "model": "glm-ocr"
  }
}

How It Works

┌──────────────┐     ┌─────────────┐     ┌─────────────────┐
│  pi (DeepSeek)│────▶│  glm_ocr    │────▶│  Ollama Server  │
│  (no vision)  │     │  (extension) │     │  (GLM-OCR 0.9B) │
└──────────────┘     └─────────────┘     └─────────────────┘
       │                    │                      │
       │   "read this pic"  │   POST /api/generate  │
       │───────────────────▶│──────────────────────▶│
       │                    │   base64 image + task  │
       │                    │◀──────────────────────│
       │   LaTeX formula    │   OCR text response   │
       │◀───────────────────│                       │

License

MIT