pi-ocr

Pi skill for extracting text from images using Tesseract OCR. Handles screenshots, photos, and scanned documents with automatic preprocessing and multi-language support.

Packages

Package details

skill

Install pi-ocr from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-ocr

Package: pi-ocr
Version: 0.1.0
Published: May 18, 2026
Downloads: not available
Author: astronaut_jack
License: MIT
Types: skill
Size: 9.9 KB
Dependencies: 0 dependencies · 1 peer

Pi manifest JSON

{
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-ocr

A lightweight, stable pi skill that teaches pi to extract text from images using Tesseract OCR directly via Bash. Handles screenshots, photos, and scanned documents with support for multi-language recognition and built-in image preprocessing.

What this package provides

Skill

The ocr skill gives pi a structured workflow for OCR tasks:

Prerequisites check — verifies tesseract is installed before attempting OCR
Language management — checks, lists, and installs language packs (e.g., chi_sim for Chinese)
Page segmentation modes — guides pi to pick the right --psm for slides, paragraphs, sparse text, etc.
Image preprocessing — teaches pi to use ImageMagick for grayscale conversion, contrast enhancement, and upscaling when image quality is poor
Batch OCR — handles multiple images or directories
Output conventions — presents results cleanly in code blocks with source labeling

Requirements

pi installed and working
Tesseract OCR (brew install tesseract on macOS)
ImageMagick (optional, for preprocessing: brew install imagemagick)
tesseract-lang (optional, for non-English languages: brew install tesseract-lang)

Installation

From npm

pi install npm:pi-ocr

From GitHub

pi install git:github.com/astronautJack/pi-ocr

Local development

pi install /path/to/pi-ocr

Usage

Once installed, the ocr skill activates automatically when the user:

Shares an image file (.png, .jpg, .tiff, .webp, etc.)
Asks "what does this say" or "can you read this" about an image
Requests OCR, text extraction, or reading text from a screenshot/photo

Example prompts

> Read the text from this screenshot: /Users/me/Desktop/error.png

> OCR this photo of a document: ~/Downloads/scan.jpg

> What does this Chinese slide say? /tmp/lecture-slide.png

Manual skill invocation

/ocr

Supported image formats

All formats supported by Tesseract and ImageMagick, including:

PNG
JPG / JPEG
TIFF
WebP
BMP
GIF

How it works

The skill instructs pi to:

Check prerequisites — which tesseract && tesseract --version
Determine language needs — check for required language packs
Run OCR — tesseract image.png /tmp/ocr_output -l eng --psm 6
Read results — cat /tmp/ocr_output.txt
Fall back gracefully — try different PSM modes or preprocessing if output is empty

Skill triggers

The skill automatically activates when:

Trigger	Example
Image file shared	"Read this screenshot"
"what does this say"	"What does this image say?"
"OCR" mentioned	"Can you OCR this photo?"
"extract text from"	"Extract text from this scan"
"read this" + image	"Can you read this?"

Why pi-ocr

Lightweight — no heavy dependencies, just the Tesseract binary you already have
Stable — calls Tesseract directly via Bash, no abstraction layers to break
Preprocessing built-in — ImageMagick pipelines for grayscale, contrast, and upscaling when image quality is poor
Multi-language — seamless chi_sim+eng mixed recognition out of the box

License

MIT — see LICENSE.

Contributing

Issues and PRs welcome at github.com/astronautJack/pi-ocr.