pi-llm-as-verifier

Pi skill + extension for llm-as-verifier style pairwise, repeated, criteria-decomposed candidate selection.

Package details

← Back

extensionskillprompt

Install pi-llm-as-verifier from npm and Pi will load the resources declared by the package manifest.

npm report

$ pi install npm:pi-llm-as-verifier

Package: pi-llm-as-verifier
Version: 0.2.2
Published: Apr 15, 2026
Downloads: 464/mo · 21/wk
Author: pk-nerdsaver-ai
License: unknown
Types: extension, skill, prompt
Size: 89.5 KB
Dependencies: 0 dependencies · 4 peers

Pi manifest JSON

{
  "extensions": [
    "./.pi/extensions"
  ],
  "skills": [
    "./.agents/skills"
  ],
  "prompts": [
    "./prompts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-llm-as-verifier

Pi package for llm-as-verifier style selection and auditing.

It bundles:

a Pi skill: llm-as-verifier
a Pi extension tool: llm_as_verifier
reusable prompt templates for common verifier workflows

Install

pi install npm:pi-llm-as-verifier

Or test without installing globally:

pi -e npm:pi-llm-as-verifier

What it does

This package helps Pi choose among multiple candidate artifacts using:

pairwise comparison
criteria decomposition
repeated verification
round-robin winner selection

It supports three backends:

gemini-python - Python runner inspired by the upstream paper/repo
zai-coding-plan - single ZAI model through Pi's model registry
pi-model-ensemble - multiple Pi models rotated across repeated attempts

Tool usage

Use the llm_as_verifier tool with:

task
candidates
criteria
optional context
optional evidencePaths
optional outputPath

Multi-model repeated attempts

For mixed-model verification, use:

backend: "pi-model-ensemble"
models: ["openai:gpt-5.4", "google:gemini-2.5-flash", "minimax:MiniMax-M2.7-highspeed"]

If nVerifications is omitted in ensemble mode, it defaults to the number of configured verifier models so each model gets one pass.

Weighted voting by model

For ensemble runs, you can bias some verifier models more strongly:

{
  "backend": "pi-model-ensemble",
  "models": [
    "openai:gpt-5.4",
    "google:gemini-2.5-flash",
    "minimax:MiniMax-M2.7-highspeed"
  ],
  "modelWeights": [
    { "model": "openai:gpt-5.4", "weight": 1.5 },
    { "model": "google:gemini-2.5-flash", "weight": 1.0 },
    { "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.8 }
  ]
}

Confidence reporting

Ensemble and ZAI-backed runs now return richer breakdowns in details, including:

criterion confidence
pairwise confidence
disagreement scores
per-model breakdowns
weighted model metadata

Example

{
  "backend": "pi-model-ensemble",
  "task": "Choose the strongest patch for the bug fix.",
  "models": [
    "openai:gpt-5.4",
    "google:gemini-2.5-flash",
    "minimax:MiniMax-M2.7-highspeed"
  ],
  "modelWeights": [
    { "model": "openai:gpt-5.4", "weight": 1.3 },
    { "model": "google:gemini-2.5-flash", "weight": 1.0 },
    { "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.9 }
  ],
  "candidates": [
    {
      "id": "patch-a",
      "content": "..."
    },
    {
      "id": "patch-b",
      "content": "..."
    }
  ],
  "criteria": [
    {
      "name": "Correctness",
      "description": "Check whether the patch directly fixes the requested behavior."
    },
    {
      "name": "Requirements adherence",
      "description": "Check whether exact task constraints are satisfied."
    },
    {
      "name": "Empirical verification",
      "description": "Check whether the candidate is supported by concrete test or runtime evidence."
    }
  ]
}

Prompt templates

This package also ships prompt templates:

/compare-patches
/audit-candidate
/ensemble-verifier

These expand into ready-made instructions for common verifier workflows.

Auth and setup

Gemini Python backend

Install:

pip install google-genai

Provide one of:

GEMINI_API_KEY
GOOGLE_API_KEY
VERTEX_API_KEY

Pi registry backends

For zai-coding-plan and pi-model-ensemble, configure model auth in Pi for whichever providers you want to use.

Smoke tests

Python-runner smoke test:

/lav-smoke

Weighted ensemble smoke test:

/lav-ensemble-smoke

Package contents

.pi/extensions/llm-as-verifier/index.ts
.agents/skills/llm-as-verifier/SKILL.md
.agents/skills/llm-as-verifier/scripts/lav_runner.py
.agents/skills/llm-as-verifier/examples/code-patch-selection.json
.agents/skills/llm-as-verifier/examples/weighted-ensemble-selection.json
prompts/*.md
bundled references and examples