pi-glm-ocr

Pi extension: Local OCR via Ollama GLM-OCR (0.9B) — convert images and PDFs to Markdown / LaTeX with high math formula accuracy. Bridges the multimodal gap for non-vision LLMs like DeepSeek.

Packages

Package details

extension

Install pi-glm-ocr from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-glm-ocr
Package
pi-glm-ocr
Version
0.1.1
Published
May 24, 2026
Downloads
not available
Author
astronaut_jack
License
MIT
Types
extension
Size
20.3 KB
Dependencies
0 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-glm-ocr

Local OCR for Pi Coding Agent — extract text, LaTeX math formulas, and tables from images and PDFs using GLM-OCR (0.9B) via Ollama.

Bridges the multimodal gap for non-vision LLMs like DeepSeek. When your model can't see images or PDFs, pi-glm-ocr acts as its eyes — with high-accuracy formula recognition outputting LaTeX.

Features

  • 🔍 Text Recognition — Extracts text as Markdown from images and PDFs
  • 🧮 Formula Recognition — Math formulas output in LaTeX with high accuracy
  • 📊 Table Recognition — Tables extracted as Markdown tables
  • 🖼️ Figure Description — Describes figures and diagrams
  • 📄 PDF Support — Converts PDF pages to images automatically (macOS/Linux)
  • 📦 Fully local — No API keys, no cloud, no data leaves your machine

Prerequisites

  1. Install Ollama (if not already):

    # macOS
    brew install ollama
    # or download from https://ollama.com/download
    
    # Linux
    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Pull the GLM-OCR model:

    ollama pull glm-ocr
    

    Model size: ~2.2 GB (bf16) or ~1.6 GB (q8_0 variant)

  3. For PDF support on Linux:

    sudo apt install poppler-utils
    

    macOS uses built-in CoreGraphics — no extra dependencies needed.

Install

pi install npm:pi-glm-ocr

Or try it without installing:

pi -e npm:pi-glm-ocr

Usage

As a tool (LLM-invoked)

The extension registers a glm_ocr tool that the agent can call automatically. Just ask pi:

> What's the formula in this screenshot?
(attach image or mention path)

# The model will call glm_ocr with task="formula" and read the LaTeX
> Extract all text from paper.pdf
# Model calls glm_ocr with task="auto" and gets back Markdown + LaTeX

As a command (user-invoked)

/glm-ocr ./screenshot.png formula
/glm-ocr ./screenshot.png auto glm-ocr:q8_0
/glm-ocr ./document.pdf auto
/glm-ocr ./table.png table
/glm-ocr ./diagram.png figure
/glm-ocr ./page.jpg text

Tasks

Task Prompt Output
text Text Recognition Markdown
formula Formula Recognition LaTeX
table Table Recognition Markdown tables
figure Figure Recognition Description
auto Full document OCR Markdown + LaTeX (mixed)

Configuration

Model Selection

You can override the model per-call:

# Via command - 3rd argument is the model
/glm-ocr ./image.png formula glm-ocr:q8_0
/glm-ocr ./paper.pdf auto llama3.2-vision

# Via LLM - the agent can specify the model parameter
# Example: "use glm_ocr with model='minicpm-v' to read this image"

Environment Variables

export OLLAMA_HOST="http://localhost:11434"  # default
export GLM_OCR_MODEL="glm-ocr"                # default model

Or in ~/.pi/agent/settings.json:

{
  "glmOcr": {
    "ollamaHost": "http://localhost:11434",
    "model": "glm-ocr"
  }
}

How It Works

┌──────────────┐     ┌─────────────┐     ┌─────────────────┐
│  pi (DeepSeek)│────▶│  glm_ocr    │────▶│  Ollama Server  │
│  (no vision)  │     │  (extension) │     │  (GLM-OCR 0.9B) │
└──────────────┘     └─────────────┘     └─────────────────┘
       │                    │                      │
       │   "read this pic"  │   POST /api/generate  │
       │───────────────────▶│──────────────────────▶│
       │                    │   base64 image + task  │
       │                    │◀──────────────────────│
       │   LaTeX formula    │   OCR text response   │
       │◀───────────────────│                       │

License

MIT