pi-ocr
Pi skill for extracting text from images using Tesseract OCR. Handles screenshots, photos, and scanned documents with automatic preprocessing and multi-language support.
Package details
Install pi-ocr from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-ocr- Package
pi-ocr- Version
0.1.0- Published
- May 18, 2026
- Downloads
- not available
- Author
- astronaut_jack
- License
- MIT
- Types
- skill
- Size
- 9.9 KB
- Dependencies
- 0 dependencies · 1 peer
Pi manifest JSON
{
"skills": [
"./skills"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-ocr
A lightweight, stable pi skill that teaches pi to extract text from images using Tesseract OCR directly via Bash. Handles screenshots, photos, and scanned documents with support for multi-language recognition and built-in image preprocessing.
What this package provides
Skill
The ocr skill gives pi a structured workflow for OCR tasks:
- Prerequisites check — verifies tesseract is installed before attempting OCR
- Language management — checks, lists, and installs language packs (e.g.,
chi_simfor Chinese) - Page segmentation modes — guides pi to pick the right
--psmfor slides, paragraphs, sparse text, etc. - Image preprocessing — teaches pi to use ImageMagick for grayscale conversion, contrast enhancement, and upscaling when image quality is poor
- Batch OCR — handles multiple images or directories
- Output conventions — presents results cleanly in code blocks with source labeling
Requirements
- pi installed and working
- Tesseract OCR (
brew install tesseracton macOS) - ImageMagick (optional, for preprocessing:
brew install imagemagick) - tesseract-lang (optional, for non-English languages:
brew install tesseract-lang)
Installation
From npm
pi install npm:pi-ocr
From GitHub
pi install git:github.com/astronautJack/pi-ocr
Local development
pi install /path/to/pi-ocr
Usage
Once installed, the ocr skill activates automatically when the user:
- Shares an image file (.png, .jpg, .tiff, .webp, etc.)
- Asks "what does this say" or "can you read this" about an image
- Requests OCR, text extraction, or reading text from a screenshot/photo
Example prompts
> Read the text from this screenshot: /Users/me/Desktop/error.png
> OCR this photo of a document: ~/Downloads/scan.jpg
> What does this Chinese slide say? /tmp/lecture-slide.png
Manual skill invocation
/ocr
Supported image formats
All formats supported by Tesseract and ImageMagick, including:
- PNG
- JPG / JPEG
- TIFF
- WebP
- BMP
- GIF
How it works
The skill instructs pi to:
- Check prerequisites —
which tesseract && tesseract --version - Determine language needs — check for required language packs
- Run OCR —
tesseract image.png /tmp/ocr_output -l eng --psm 6 - Read results —
cat /tmp/ocr_output.txt - Fall back gracefully — try different PSM modes or preprocessing if output is empty
Skill triggers
The skill automatically activates when:
| Trigger | Example |
|---|---|
| Image file shared | "Read this screenshot" |
| "what does this say" | "What does this image say?" |
| "OCR" mentioned | "Can you OCR this photo?" |
| "extract text from" | "Extract text from this scan" |
| "read this" + image | "Can you read this?" |
Why pi-ocr
- Lightweight — no heavy dependencies, just the Tesseract binary you already have
- Stable — calls Tesseract directly via Bash, no abstraction layers to break
- Preprocessing built-in — ImageMagick pipelines for grayscale, contrast, and upscaling when image quality is poor
- Multi-language — seamless
chi_sim+engmixed recognition out of the box
License
MIT — see LICENSE.
Contributing
Issues and PRs welcome at github.com/astronautJack/pi-ocr.