pi-ocr

Pi extension: Zero-setup multi-backend OCR — MinerU (free cloud), Ollama (local GPU, LaTeX formulas), Pix2Text (local Python). Extract text, formulas, and tables from images and PDFs. Default: zero config, works out of the box.

Packages

Package details

extension

Install pi-ocr from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-ocr

Package: pi-ocr
Version: 1.4.0
Published: Jun 13, 2026
Downloads: 362/mo · 60/wk
Author: astronaut_jack
License: MIT
Types: extension
Size: 77.3 KB
Dependencies: 0 dependencies · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-ocr

⚡ Zero setup. Works out of the box.

Default backend is MinerU — a free cloud API. No GPU, no API key, no pip install. Just pi install and /ocr.

OCR for Pi Coding Agent. Bridges the multimodal gap for non-vision LLMs like DeepSeek: when your model can't see images, pi_ocr reads them for you.

Quickstart

pi install npm:pi-ocr
/ocr ./screenshot.png
/ocr ./paper.pdf

That's all. MinerU (free cloud API) is the default — zero config.

The pi_ocr tool takes only a file path. Backend, model, and task are configured by the user via /ocr settings — the AI doesn't need to manage them.

Backends

Switch anytime with /ocr (no args).

	Backend	Best for	Setup
☁️	MinerU (default)	PDFs, general docs	None
☁️	MinerU Pro	Large PDFs, vlm accuracy	API token
🦙	Ollama	Math formulas → LaTeX	GPU + 2.2GB model
🔤	Tesseract	Plain text (~30MB)	`brew install tesseract`
📐	Pix2Text	Math + text, GPU/CPU	`pip install pix2text`

💡 Unsure which backend to pick? See the benchmark with real test results and the ground truth for comparison.

MinerU (default)

Free cloud API. Images are wrapped as PDF so language-aware OCR applies.

Limits: ≤10MB, ≤20 pages/request. PDFs >20 pages auto-split via pypdfium2.

MinerU Pro (vlm model)

Higher accuracy via token-based precision API. ≤200MB, ≤200 pages — no splitting needed.

Get a free token at mineru.net/apiManage, then set it in /ocr settings. 1000 pages/day high-priority.

Ollama

Local GPU OCR via glm-ocr — state-of-the-art formula recognition (94.6 OmniDocBench). Outputs LaTeX.

# macOS
brew install ollama && ollama pull glm-ocr
brew install poppler   # multi-page PDFs

# Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama pull glm-ocr
sudo apt install poppler-utils

Tesseract

Classic OCR engine. Ultra-lightweight (~30MB). No formula support — use Ollama or Pix2Text for math.

brew install tesseract              # macOS
sudo apt install tesseract-ocr      # Linux

Supports Chinese: brew install tesseract-lang (auto-installed on macOS).

Pix2Text

Mathpix alternative — handles text + formulas on GPU (CUDA/MPS) or CPU. Auto-detects best device.

pip install pix2text

First run downloads ONNX models (~50MB).

Settings

Open with /ocr (no args).

Setting	Description
OCR Backend	Switch between MinerU, Ollama, Pix2Text, Tesseract
MinerU: Split PDF >20 pages	Auto-split large PDFs into free-tier chunks
MinerU Pro Token	API token from mineru.net/apiManage
Ollama Model	Vision model (glm-ocr, minicpm-v, etc.)
Clear OCR temp files	Remove cached OCR output from /tmp

Output Behavior

Results ≤2000 chars are returned inline in the tool response. Longer results are written to a temp file (/tmp/pi-ocr-*.md); the tool response includes the file path for the AI to read.

Commands

Command
`/ocr`	Open settings (backend, model, split toggle, clear cache)
`/ocr <file>`	OCR a file
`/ocr <file> formula`	Math LaTeX output (Ollama backend)

Troubleshooting

MinerU 429 → Wait a minute or switch backend.

MinerU Pro 401 → Regenerate token at mineru.net/apiManage.

"Is Ollama running?" → ollama serve

"pdftoppm not found" → brew install poppler / sudo apt install poppler-utils

"python3 not found" (Pix2Text) → pip install pix2text

"tesseract not found" → brew install tesseract / sudo apt install tesseract-ocr

License

MIT