pi-glm-ocr
Pi extension: Local OCR via Ollama GLM-OCR (0.9B) — convert images and PDFs to Markdown / LaTeX with high math formula accuracy. Bridges the multimodal gap for non-vision LLMs like DeepSeek.
Package details
Install pi-glm-ocr from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-glm-ocr- Package
pi-glm-ocr- Version
0.1.1- Published
- May 24, 2026
- Downloads
- not available
- Author
- astronaut_jack
- License
- MIT
- Types
- extension
- Size
- 20.3 KB
- Dependencies
- 0 dependencies · 3 peers
Pi manifest JSON
{
"extensions": [
"./extensions"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-glm-ocr
Local OCR for Pi Coding Agent — extract text, LaTeX math formulas, and tables from images and PDFs using GLM-OCR (0.9B) via Ollama.
Bridges the multimodal gap for non-vision LLMs like DeepSeek. When your model can't see images or PDFs,
pi-glm-ocracts as its eyes — with high-accuracy formula recognition outputting LaTeX.
Features
- 🔍 Text Recognition — Extracts text as Markdown from images and PDFs
- 🧮 Formula Recognition — Math formulas output in LaTeX with high accuracy
- 📊 Table Recognition — Tables extracted as Markdown tables
- 🖼️ Figure Description — Describes figures and diagrams
- 📄 PDF Support — Converts PDF pages to images automatically (macOS/Linux)
- 📦 Fully local — No API keys, no cloud, no data leaves your machine
Prerequisites
Install Ollama (if not already):
# macOS brew install ollama # or download from https://ollama.com/download # Linux curl -fsSL https://ollama.com/install.sh | shPull the GLM-OCR model:
ollama pull glm-ocrModel size: ~2.2 GB (bf16) or ~1.6 GB (q8_0 variant)
For PDF support on Linux:
sudo apt install poppler-utilsmacOS uses built-in CoreGraphics — no extra dependencies needed.
Install
pi install npm:pi-glm-ocr
Or try it without installing:
pi -e npm:pi-glm-ocr
Usage
As a tool (LLM-invoked)
The extension registers a glm_ocr tool that the agent can call automatically. Just ask pi:
> What's the formula in this screenshot?
(attach image or mention path)
# The model will call glm_ocr with task="formula" and read the LaTeX
> Extract all text from paper.pdf
# Model calls glm_ocr with task="auto" and gets back Markdown + LaTeX
As a command (user-invoked)
/glm-ocr ./screenshot.png formula
/glm-ocr ./screenshot.png auto glm-ocr:q8_0
/glm-ocr ./document.pdf auto
/glm-ocr ./table.png table
/glm-ocr ./diagram.png figure
/glm-ocr ./page.jpg text
Tasks
| Task | Prompt | Output |
|---|---|---|
text |
Text Recognition | Markdown |
formula |
Formula Recognition | LaTeX |
table |
Table Recognition | Markdown tables |
figure |
Figure Recognition | Description |
auto |
Full document OCR | Markdown + LaTeX (mixed) |
Configuration
Model Selection
You can override the model per-call:
# Via command - 3rd argument is the model
/glm-ocr ./image.png formula glm-ocr:q8_0
/glm-ocr ./paper.pdf auto llama3.2-vision
# Via LLM - the agent can specify the model parameter
# Example: "use glm_ocr with model='minicpm-v' to read this image"
Environment Variables
export OLLAMA_HOST="http://localhost:11434" # default
export GLM_OCR_MODEL="glm-ocr" # default model
Or in ~/.pi/agent/settings.json:
{
"glmOcr": {
"ollamaHost": "http://localhost:11434",
"model": "glm-ocr"
}
}
How It Works
┌──────────────┐ ┌─────────────┐ ┌─────────────────┐
│ pi (DeepSeek)│────▶│ glm_ocr │────▶│ Ollama Server │
│ (no vision) │ │ (extension) │ │ (GLM-OCR 0.9B) │
└──────────────┘ └─────────────┘ └─────────────────┘
│ │ │
│ "read this pic" │ POST /api/generate │
│───────────────────▶│──────────────────────▶│
│ │ base64 image + task │
│ │◀──────────────────────│
│ LaTeX formula │ OCR text response │
│◀───────────────────│ │
License
MIT