pi-image-subagent
Pi extension that gives non-vision models the ability to analyze images via a vision-capable subagent
Package details
Install pi-image-subagent from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-image-subagent- Package
pi-image-subagent- Version
1.0.0- Published
- Apr 19, 2026
- Downloads
- 138/mo · 6/wk
- Author
- alpino13
- License
- MIT
- Types
- extension
- Size
- 25.3 KB
- Dependencies
- 0 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./analyze-image"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-image-subagent
A Pi extension that gives non-vision models the ability to analyze images by delegating to a vision-capable subagent.
If you're running a model that can't see images — a code model, a small local model, anything without image input support — this extension adds an analyze_image tool that your agent can call to hand off image analysis to a model that can see.
The Problem
Most coding agents run on text-only models. When you paste a screenshot, a mockup, or any image into the Pi TUI, the model can't do anything with it — it just sees a file path. This extension bridges that gap.
How It Works
You paste an image or reference a file
│
▼
┌─────────────────────────────────────┐
│ Your agent (non-vision model) │
│ │
│ "What's in this screenshot?" │
│ calls: analyze_image({ │
│ images: ["/tmp/ss.png"], │
│ question: "What's shown?" │
│ }) │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Vision subagent (gemma4:31b-cloud) │
│ │
│ Has only the `read` tool. │
│ 1. Reads each image file │
│ 2. Sees the image content │
│ 3. Answers the question │
│ 4. Returns plain text description │
└──────────────┬──────────────────────┘
│
▼
Your agent gets back a text description
and continues as if it saw the image
- The subagent runs in its own isolated
piprocess — no shared context, no session pollution. - It only has the
readtool — it can't modify files, run commands, or do anything beyond looking at the images you specified. - Each call is completely stateless.
Installation
Symlink (recommended)
This way edits to the source file are picked up immediately (after /reload):
mkdir -p ~/.pi/agent/extensions/analyze-image
ln -sf "$(pwd)/analyze-image/index.ts" ~/.pi/agent/extensions/analyze-image/index.ts
Copy
mkdir -p ~/.pi/agent/extensions/analyze-image
cp analyze-image/index.ts ~/.pi/agent/extensions/analyze-image/index.ts
Verify
Restart Pi (or run /reload). You should see analyze-image listed under Extensions in the startup output. The analyze_image tool will be available to the model.
Configuration
The extension works out of the box with defaults. To customize, create a config file:
~/.pi/agent/extensions/analyze-image/config.json
A starter config is provided at analyze-image/config.example.json — copy it as a starting point:
cp analyze-image/config.example.json ~/.pi/agent/extensions/analyze-image/config.json
Options
| Field | Type | Default | Description |
|---|---|---|---|
defaultModel |
string |
"gemma4:31b-cloud" |
The vision-capable model the subagent uses |
systemPrompt |
string |
(see source) | System prompt sent to the subagent |
maxImagesPerCall |
number |
10 |
Maximum number of images per single call |
Changing the vision model
The default is gemma4:31b-cloud. Change it to any model your Pi installation can access that supports image input:
{
"defaultModel": "anthropic/claude-sonnet-4"
}
{
"defaultModel": "google/gemini-2.0-flash"
}
Run pi --list-models to see what's available. The model must support image input (input: ["text", "image"]).
Tuning the system prompt
The default system prompt is strict — it tells the subagent to read every image, answer the question, and do nothing else. If you want the subagent to take a different approach (e.g., be more creative, focus on specific aspects, output in a particular format), override it:
{
"defaultModel": "gemma4:31b-cloud",
"systemPrompt": "You describe images for a visually impaired user. Be thorough and empathetic. Read every image file before describing it."
}
The config is reloaded at the start of each Pi session, so you can edit it while Pi is running and it takes effect on the next /new or restart.
The Tool
analyze_image
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
images |
string[] |
yes | One or more local file paths to images |
question |
string |
yes | What you want to know about the image(s) |
model |
string |
no | Override the configured default model for this call |
Supported formats: PNG, JPG, JPEG, GIF, WebP, BMP
Returns: Plain text — the vision model's answer to your question.
Usage
You don't call analyze_image yourself. The LLM calls it when it needs to understand an image. Your job is to give the agent a reason to look at an image.
Pasting from clipboard
- Copy an image to your clipboard (screenshot, browser, etc.)
- In the Pi TUI, press Ctrl+V — Pi saves the image to a temp file and inserts the path
- Ask your question:
I just pasted a screenshot. What UI components are visible?
The agent will call analyze_image with the temp file path and your question.
Referencing a file directly
What text is shown in /home/me/Desktop/error-message.png?
Is there anything unusual about the layout in ./mockup-v3.webp compared to a typical settings page?
Multiple images at once
The agent can batch multiple images in a single call — all images go to the same subagent, so the vision model can compare and cross-reference them:
Compare the two screenshots in /tmp/before.png and /tmp/after.png — what changed?
The LLM calls:
{
"images": ["/tmp/before.png", "/tmp/after.png"],
"question": "What changed between these two screenshots?"
}
Overriding the model for one call
You can tell the agent to use a different model for a specific analysis:
Use analyze_image with the model anthropic/claude-sonnet-4 to read the chart in /tmp/revenue-q4.png and tell me the top 3 quarters.
What the agent sees in the TUI
During the call:
analyze_image 2 images
What changed between these two screenshots?
📄 before.png
📄 after.png
When the result comes back:
✓ analyze_image (gemma4:31b-cloud) 3.2s
The two screenshots show a settings page. In the "after" version,
the navigation sidebar has been reorganized — the "Account" section
was moved above "Privacy" and a new "Notifications" entry was added...
Press Ctrl+O to expand the full output if it's long.
Requirements
- Pi coding agent installed and working
- At least one vision-capable model configured in Pi (the default is
gemma4:31b-cloud— change this if you don't have it) - The vision model's API key must be set (e.g.,
GEMINI_API_KEY,ANTHROPIC_API_KEY, etc.)
Limitations
- Local files only — the tool accepts file paths on your machine, not URLs. If you need to analyze a remote image, download it first.
- No resize — images are passed through at original resolution. Very large images may hit context limits in the vision model.
- Two-turn subagent — the subagent first calls
readon each image, then answers the question. This means it uses two LLM turns per call. - The subagent must call
read— the vision model needs to be smart enough to follow the system prompt's instruction to read the image files before answering. This works well with most capable vision models, but a very weak model might skip the read step and hallucinate. --tools read— the subagent only has thereadtool. It cannot run bash commands, edit files, or do anything else. This is a security feature, not a bug.
Troubleshooting
| Problem | Solution |
|---|---|
| Extension doesn't show in startup | Check the file is at ~/.pi/agent/extensions/analyze-image/index.ts and run /reload |
analyze_image tool not available |
Pi only shows tools for models that support tools. Make sure your model supports tool use |
| "Image file not found" | Use absolute paths, or paths relative to where you launched pi |
| "Subagent failed with no output" | The vision model likely isn't configured. Check your API key and run pi --list-models |
| Subagent doesn't read the images | The vision model may be too weak to follow the system prompt. Try a more capable model |
| Analysis takes a long time | Large images + slow model = wait. You can Ctrl+C to abort mid-analysis |
Project Structure
analyze-image/
├── index.ts # Extension source code
├── config.example.json # Starter config — copy to ~/.pi/agent/extensions/analyze-image/
└── README.md # This file
License
MIT