pi-ocr

Pi skill for extracting text from images using Tesseract OCR. Handles screenshots, photos, and scanned documents with automatic preprocessing and multi-language support.

Packages

Package details

skill

Install pi-ocr from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-ocr
Package
pi-ocr
Version
0.1.0
Published
May 18, 2026
Downloads
not available
Author
astronaut_jack
License
MIT
Types
skill
Size
9.9 KB
Dependencies
0 dependencies · 1 peer
Pi manifest JSON
{
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-ocr

A lightweight, stable pi skill that teaches pi to extract text from images using Tesseract OCR directly via Bash. Handles screenshots, photos, and scanned documents with support for multi-language recognition and built-in image preprocessing.

What this package provides

Skill

The ocr skill gives pi a structured workflow for OCR tasks:

  • Prerequisites check — verifies tesseract is installed before attempting OCR
  • Language management — checks, lists, and installs language packs (e.g., chi_sim for Chinese)
  • Page segmentation modes — guides pi to pick the right --psm for slides, paragraphs, sparse text, etc.
  • Image preprocessing — teaches pi to use ImageMagick for grayscale conversion, contrast enhancement, and upscaling when image quality is poor
  • Batch OCR — handles multiple images or directories
  • Output conventions — presents results cleanly in code blocks with source labeling

Requirements

  • pi installed and working
  • Tesseract OCR (brew install tesseract on macOS)
  • ImageMagick (optional, for preprocessing: brew install imagemagick)
  • tesseract-lang (optional, for non-English languages: brew install tesseract-lang)

Installation

From npm

pi install npm:pi-ocr

From GitHub

pi install git:github.com/astronautJack/pi-ocr

Local development

pi install /path/to/pi-ocr

Usage

Once installed, the ocr skill activates automatically when the user:

  • Shares an image file (.png, .jpg, .tiff, .webp, etc.)
  • Asks "what does this say" or "can you read this" about an image
  • Requests OCR, text extraction, or reading text from a screenshot/photo

Example prompts

> Read the text from this screenshot: /Users/me/Desktop/error.png

> OCR this photo of a document: ~/Downloads/scan.jpg

> What does this Chinese slide say? /tmp/lecture-slide.png

Manual skill invocation

/ocr

Supported image formats

All formats supported by Tesseract and ImageMagick, including:

  • PNG
  • JPG / JPEG
  • TIFF
  • WebP
  • BMP
  • GIF

How it works

The skill instructs pi to:

  1. Check prerequisiteswhich tesseract && tesseract --version
  2. Determine language needs — check for required language packs
  3. Run OCRtesseract image.png /tmp/ocr_output -l eng --psm 6
  4. Read resultscat /tmp/ocr_output.txt
  5. Fall back gracefully — try different PSM modes or preprocessing if output is empty

Skill triggers

The skill automatically activates when:

Trigger Example
Image file shared "Read this screenshot"
"what does this say" "What does this image say?"
"OCR" mentioned "Can you OCR this photo?"
"extract text from" "Extract text from this scan"
"read this" + image "Can you read this?"

Why pi-ocr

  • Lightweight — no heavy dependencies, just the Tesseract binary you already have
  • Stable — calls Tesseract directly via Bash, no abstraction layers to break
  • Preprocessing built-in — ImageMagick pipelines for grayscale, contrast, and upscaling when image quality is poor
  • Multi-language — seamless chi_sim+eng mixed recognition out of the box

License

MIT — see LICENSE.

Contributing

Issues and PRs welcome at github.com/astronautJack/pi-ocr.