pi-image-subagent

Pi extension that gives non-vision models the ability to analyze images via a vision-capable subagent

Package details

extension

Install pi-image-subagent from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-image-subagent
Package
pi-image-subagent
Version
1.0.0
Published
Apr 19, 2026
Downloads
138/mo · 6/wk
Author
alpino13
License
MIT
Types
extension
Size
25.3 KB
Dependencies
0 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./analyze-image"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-image-subagent

A Pi extension that gives non-vision models the ability to analyze images by delegating to a vision-capable subagent.

If you're running a model that can't see images — a code model, a small local model, anything without image input support — this extension adds an analyze_image tool that your agent can call to hand off image analysis to a model that can see.

The Problem

Most coding agents run on text-only models. When you paste a screenshot, a mockup, or any image into the Pi TUI, the model can't do anything with it — it just sees a file path. This extension bridges that gap.

How It Works

  You paste an image or reference a file
                 │
                 ▼
  ┌─────────────────────────────────────┐
  │  Your agent (non-vision model)       │
  │                                      │
  │  "What's in this screenshot?"         │
  │  calls: analyze_image({               │
  │    images: ["/tmp/ss.png"],           │
  │    question: "What's shown?"          │
  │  })                                   │
  └──────────────┬──────────────────────┘
                 │
                 ▼
  ┌─────────────────────────────────────┐
  │  Vision subagent (gemma4:31b-cloud)  │
  │                                      │
  │  Has only the `read` tool.           │
  │  1. Reads each image file            │
  │  2. Sees the image content           │
  │  3. Answers the question             │
  │  4. Returns plain text description   │
  └──────────────┬──────────────────────┘
                 │
                 ▼
  Your agent gets back a text description
  and continues as if it saw the image
  • The subagent runs in its own isolated pi process — no shared context, no session pollution.
  • It only has the read tool — it can't modify files, run commands, or do anything beyond looking at the images you specified.
  • Each call is completely stateless.

Installation

Symlink (recommended)

This way edits to the source file are picked up immediately (after /reload):

mkdir -p ~/.pi/agent/extensions/analyze-image
ln -sf "$(pwd)/analyze-image/index.ts" ~/.pi/agent/extensions/analyze-image/index.ts

Copy

mkdir -p ~/.pi/agent/extensions/analyze-image
cp analyze-image/index.ts ~/.pi/agent/extensions/analyze-image/index.ts

Verify

Restart Pi (or run /reload). You should see analyze-image listed under Extensions in the startup output. The analyze_image tool will be available to the model.

Configuration

The extension works out of the box with defaults. To customize, create a config file:

~/.pi/agent/extensions/analyze-image/config.json

A starter config is provided at analyze-image/config.example.json — copy it as a starting point:

cp analyze-image/config.example.json ~/.pi/agent/extensions/analyze-image/config.json

Options

Field Type Default Description
defaultModel string "gemma4:31b-cloud" The vision-capable model the subagent uses
systemPrompt string (see source) System prompt sent to the subagent
maxImagesPerCall number 10 Maximum number of images per single call

Changing the vision model

The default is gemma4:31b-cloud. Change it to any model your Pi installation can access that supports image input:

{
  "defaultModel": "anthropic/claude-sonnet-4"
}
{
  "defaultModel": "google/gemini-2.0-flash"
}

Run pi --list-models to see what's available. The model must support image input (input: ["text", "image"]).

Tuning the system prompt

The default system prompt is strict — it tells the subagent to read every image, answer the question, and do nothing else. If you want the subagent to take a different approach (e.g., be more creative, focus on specific aspects, output in a particular format), override it:

{
  "defaultModel": "gemma4:31b-cloud",
  "systemPrompt": "You describe images for a visually impaired user. Be thorough and empathetic. Read every image file before describing it."
}

The config is reloaded at the start of each Pi session, so you can edit it while Pi is running and it takes effect on the next /new or restart.

The Tool

analyze_image

Parameters:

Parameter Type Required Description
images string[] yes One or more local file paths to images
question string yes What you want to know about the image(s)
model string no Override the configured default model for this call

Supported formats: PNG, JPG, JPEG, GIF, WebP, BMP

Returns: Plain text — the vision model's answer to your question.

Usage

You don't call analyze_image yourself. The LLM calls it when it needs to understand an image. Your job is to give the agent a reason to look at an image.

Pasting from clipboard

  1. Copy an image to your clipboard (screenshot, browser, etc.)
  2. In the Pi TUI, press Ctrl+V — Pi saves the image to a temp file and inserts the path
  3. Ask your question:
I just pasted a screenshot. What UI components are visible?

The agent will call analyze_image with the temp file path and your question.

Referencing a file directly

What text is shown in /home/me/Desktop/error-message.png?
Is there anything unusual about the layout in ./mockup-v3.webp compared to a typical settings page?

Multiple images at once

The agent can batch multiple images in a single call — all images go to the same subagent, so the vision model can compare and cross-reference them:

Compare the two screenshots in /tmp/before.png and /tmp/after.png — what changed?

The LLM calls:

{
  "images": ["/tmp/before.png", "/tmp/after.png"],
  "question": "What changed between these two screenshots?"
}

Overriding the model for one call

You can tell the agent to use a different model for a specific analysis:

Use analyze_image with the model anthropic/claude-sonnet-4 to read the chart in /tmp/revenue-q4.png and tell me the top 3 quarters.

What the agent sees in the TUI

During the call:

analyze_image 2 images
  What changed between these two screenshots?
  📄 before.png
  📄 after.png

When the result comes back:

✓ analyze_image (gemma4:31b-cloud) 3.2s
The two screenshots show a settings page. In the "after" version,
the navigation sidebar has been reorganized — the "Account" section
was moved above "Privacy" and a new "Notifications" entry was added...

Press Ctrl+O to expand the full output if it's long.

Requirements

  • Pi coding agent installed and working
  • At least one vision-capable model configured in Pi (the default is gemma4:31b-cloud — change this if you don't have it)
  • The vision model's API key must be set (e.g., GEMINI_API_KEY, ANTHROPIC_API_KEY, etc.)

Limitations

  • Local files only — the tool accepts file paths on your machine, not URLs. If you need to analyze a remote image, download it first.
  • No resize — images are passed through at original resolution. Very large images may hit context limits in the vision model.
  • Two-turn subagent — the subagent first calls read on each image, then answers the question. This means it uses two LLM turns per call.
  • The subagent must call read — the vision model needs to be smart enough to follow the system prompt's instruction to read the image files before answering. This works well with most capable vision models, but a very weak model might skip the read step and hallucinate.
  • --tools read — the subagent only has the read tool. It cannot run bash commands, edit files, or do anything else. This is a security feature, not a bug.

Troubleshooting

Problem Solution
Extension doesn't show in startup Check the file is at ~/.pi/agent/extensions/analyze-image/index.ts and run /reload
analyze_image tool not available Pi only shows tools for models that support tools. Make sure your model supports tool use
"Image file not found" Use absolute paths, or paths relative to where you launched pi
"Subagent failed with no output" The vision model likely isn't configured. Check your API key and run pi --list-models
Subagent doesn't read the images The vision model may be too weak to follow the system prompt. Try a more capable model
Analysis takes a long time Large images + slow model = wait. You can Ctrl+C to abort mid-analysis

Project Structure

analyze-image/
├── index.ts              # Extension source code
├── config.example.json   # Starter config — copy to ~/.pi/agent/extensions/analyze-image/
└── README.md             # This file

License

MIT