pi-llm-as-verifier

Pi skill + extension for llm-as-verifier style pairwise, repeated, criteria-decomposed candidate selection.

Package details

extensionskillprompt

Install pi-llm-as-verifier from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-llm-as-verifier
Package
pi-llm-as-verifier
Version
0.2.2
Published
Apr 15, 2026
Downloads
464/mo · 21/wk
Author
pk-nerdsaver-ai
License
unknown
Types
extension, skill, prompt
Size
89.5 KB
Dependencies
0 dependencies · 4 peers
Pi manifest JSON
{
  "extensions": [
    "./.pi/extensions"
  ],
  "skills": [
    "./.agents/skills"
  ],
  "prompts": [
    "./prompts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-llm-as-verifier

Pi package for llm-as-verifier style selection and auditing.

It bundles:

  • a Pi skill: llm-as-verifier
  • a Pi extension tool: llm_as_verifier
  • reusable prompt templates for common verifier workflows

Install

pi install npm:pi-llm-as-verifier

Or test without installing globally:

pi -e npm:pi-llm-as-verifier

What it does

This package helps Pi choose among multiple candidate artifacts using:

  • pairwise comparison
  • criteria decomposition
  • repeated verification
  • round-robin winner selection

It supports three backends:

  • gemini-python - Python runner inspired by the upstream paper/repo
  • zai-coding-plan - single ZAI model through Pi's model registry
  • pi-model-ensemble - multiple Pi models rotated across repeated attempts

Tool usage

Use the llm_as_verifier tool with:

  • task
  • candidates
  • criteria
  • optional context
  • optional evidencePaths
  • optional outputPath

Multi-model repeated attempts

For mixed-model verification, use:

  • backend: "pi-model-ensemble"
  • models: ["openai:gpt-5.4", "google:gemini-2.5-flash", "minimax:MiniMax-M2.7-highspeed"]

If nVerifications is omitted in ensemble mode, it defaults to the number of configured verifier models so each model gets one pass.

Weighted voting by model

For ensemble runs, you can bias some verifier models more strongly:

{
  "backend": "pi-model-ensemble",
  "models": [
    "openai:gpt-5.4",
    "google:gemini-2.5-flash",
    "minimax:MiniMax-M2.7-highspeed"
  ],
  "modelWeights": [
    { "model": "openai:gpt-5.4", "weight": 1.5 },
    { "model": "google:gemini-2.5-flash", "weight": 1.0 },
    { "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.8 }
  ]
}

Confidence reporting

Ensemble and ZAI-backed runs now return richer breakdowns in details, including:

  • criterion confidence
  • pairwise confidence
  • disagreement scores
  • per-model breakdowns
  • weighted model metadata

Example

{
  "backend": "pi-model-ensemble",
  "task": "Choose the strongest patch for the bug fix.",
  "models": [
    "openai:gpt-5.4",
    "google:gemini-2.5-flash",
    "minimax:MiniMax-M2.7-highspeed"
  ],
  "modelWeights": [
    { "model": "openai:gpt-5.4", "weight": 1.3 },
    { "model": "google:gemini-2.5-flash", "weight": 1.0 },
    { "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.9 }
  ],
  "candidates": [
    {
      "id": "patch-a",
      "content": "..."
    },
    {
      "id": "patch-b",
      "content": "..."
    }
  ],
  "criteria": [
    {
      "name": "Correctness",
      "description": "Check whether the patch directly fixes the requested behavior."
    },
    {
      "name": "Requirements adherence",
      "description": "Check whether exact task constraints are satisfied."
    },
    {
      "name": "Empirical verification",
      "description": "Check whether the candidate is supported by concrete test or runtime evidence."
    }
  ]
}

Prompt templates

This package also ships prompt templates:

  • /compare-patches
  • /audit-candidate
  • /ensemble-verifier

These expand into ready-made instructions for common verifier workflows.

Auth and setup

Gemini Python backend

Install:

pip install google-genai

Provide one of:

  • GEMINI_API_KEY
  • GOOGLE_API_KEY
  • VERTEX_API_KEY

Pi registry backends

For zai-coding-plan and pi-model-ensemble, configure model auth in Pi for whichever providers you want to use.

Smoke tests

Python-runner smoke test:

/lav-smoke

Weighted ensemble smoke test:

/lav-ensemble-smoke

Package contents

  • .pi/extensions/llm-as-verifier/index.ts
  • .agents/skills/llm-as-verifier/SKILL.md
  • .agents/skills/llm-as-verifier/scripts/lav_runner.py
  • .agents/skills/llm-as-verifier/examples/code-patch-selection.json
  • .agents/skills/llm-as-verifier/examples/weighted-ensemble-selection.json
  • prompts/*.md
  • bundled references and examples