pi-ocr

Pi extension: Zero-setup multi-backend OCR — MinerU (free cloud), Ollama (local GPU, LaTeX formulas), Pix2Text (local Python). Extract text, formulas, and tables from images and PDFs. Default: zero config, works out of the box.

Packages

Package details

extension

Install pi-ocr from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-ocr
Package
pi-ocr
Version
1.3.15
Published
Jun 4, 2026
Downloads
3,118/mo · 2,764/wk
Author
astronaut_jack
License
MIT
Types
extension
Size
77.9 KB
Dependencies
0 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-ocr

⚡ Zero setup. Works out of the box.

Default backend is MinerU — a free cloud API. No GPU, no API key, no pip install. Just pi install and /ocr.

OCR for Pi Coding Agent. Bridges the multimodal gap for non-vision LLMs like DeepSeek: when your model can't see images, pi_ocr reads them for you.


Quickstart

pi install npm:pi-ocr
/ocr ./screenshot.png
/ocr ./paper.pdf

That's all. MinerU (free cloud API) is the default — zero config.

The pi_ocr tool takes only a file path. Backend, model, and task are configured by the user via /ocr settings — the AI doesn't need to manage them.


Backends

Switch anytime with /ocr (no args).

Backend Best for Setup
☁️ MinerU (default) PDFs, general docs None
☁️ MinerU Pro Large PDFs, vlm accuracy API token
🦙 Ollama Math formulas → LaTeX GPU + 2.2GB model
🔤 Tesseract Plain text (~30MB) brew install tesseract
📐 Pix2Text Math + text, GPU/CPU pip install pix2text

💡 Unsure which backend to pick? See the benchmark with real test results and the ground truth for comparison.


MinerU (default)

Free cloud API. Images are wrapped as PDF so language-aware OCR applies.

Limits: ≤10MB, ≤20 pages/request. PDFs >20 pages auto-split via pypdfium2.


MinerU Pro (vlm model)

Higher accuracy via token-based precision API. ≤200MB, ≤200 pages — no splitting needed.

Get a free token at mineru.net/apiManage, then set it in /ocr settings. 1000 pages/day high-priority.


Ollama

Local GPU OCR via glm-ocr — state-of-the-art formula recognition (94.6 OmniDocBench). Outputs LaTeX.

# macOS
brew install ollama && ollama pull glm-ocr
brew install poppler   # multi-page PDFs

# Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama pull glm-ocr
sudo apt install poppler-utils

Tesseract

Classic OCR engine. Ultra-lightweight (~30MB). No formula support — use Ollama or Pix2Text for math.

brew install tesseract              # macOS
sudo apt install tesseract-ocr      # Linux

Supports Chinese: brew install tesseract-lang (auto-installed on macOS).


Pix2Text

Mathpix alternative — handles text + formulas on GPU (CUDA/MPS) or CPU. Auto-detects best device.

pip install pix2text

First run downloads ONNX models (~50MB).


Settings

Open with /ocr (no args).

Setting Description
OCR Backend Switch between MinerU, Ollama, Pix2Text, Tesseract
MinerU: Split PDF >20 pages Auto-split large PDFs into free-tier chunks
MinerU Pro Token API token from mineru.net/apiManage
Ollama Model Vision model (glm-ocr, minicpm-v, etc.)
Clear OCR temp files Remove cached OCR output from /tmp

Output Behavior

Results ≤2000 chars are returned inline in the tool response. Longer results are written to a temp file (/tmp/pi-ocr-*.md); the tool response includes the file path for the AI to read.


Commands

Command
/ocr Open settings (backend, model, split toggle, clear cache)
/ocr <file> OCR a file
/ocr <file> formula Math LaTeX output (Ollama backend)

Troubleshooting

MinerU 429 → Wait a minute or switch backend.

MinerU Pro 401 → Regenerate token at mineru.net/apiManage.

"Is Ollama running?"ollama serve

"pdftoppm not found"brew install poppler / sudo apt install poppler-utils

"python3 not found" (Pix2Text)pip install pix2text

"tesseract not found"brew install tesseract / sudo apt install tesseract-ocr


License

MIT