pi-llm-as-verifier
Pi skill + extension for llm-as-verifier style pairwise, repeated, criteria-decomposed candidate selection.
Package details
Install pi-llm-as-verifier from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-llm-as-verifier- Package
pi-llm-as-verifier- Version
0.2.2- Published
- Apr 15, 2026
- Downloads
- 464/mo · 21/wk
- Author
- pk-nerdsaver-ai
- License
- unknown
- Types
- extension, skill, prompt
- Size
- 89.5 KB
- Dependencies
- 0 dependencies · 4 peers
Pi manifest JSON
{
"extensions": [
"./.pi/extensions"
],
"skills": [
"./.agents/skills"
],
"prompts": [
"./prompts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-llm-as-verifier
Pi package for llm-as-verifier style selection and auditing.
It bundles:
- a Pi skill:
llm-as-verifier - a Pi extension tool:
llm_as_verifier - reusable prompt templates for common verifier workflows
Install
pi install npm:pi-llm-as-verifier
Or test without installing globally:
pi -e npm:pi-llm-as-verifier
What it does
This package helps Pi choose among multiple candidate artifacts using:
- pairwise comparison
- criteria decomposition
- repeated verification
- round-robin winner selection
It supports three backends:
gemini-python- Python runner inspired by the upstream paper/repozai-coding-plan- single ZAI model through Pi's model registrypi-model-ensemble- multiple Pi models rotated across repeated attempts
Tool usage
Use the llm_as_verifier tool with:
taskcandidatescriteria- optional
context - optional
evidencePaths - optional
outputPath
Multi-model repeated attempts
For mixed-model verification, use:
backend: "pi-model-ensemble"models: ["openai:gpt-5.4", "google:gemini-2.5-flash", "minimax:MiniMax-M2.7-highspeed"]
If nVerifications is omitted in ensemble mode, it defaults to the number of configured verifier models so each model gets one pass.
Weighted voting by model
For ensemble runs, you can bias some verifier models more strongly:
{
"backend": "pi-model-ensemble",
"models": [
"openai:gpt-5.4",
"google:gemini-2.5-flash",
"minimax:MiniMax-M2.7-highspeed"
],
"modelWeights": [
{ "model": "openai:gpt-5.4", "weight": 1.5 },
{ "model": "google:gemini-2.5-flash", "weight": 1.0 },
{ "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.8 }
]
}
Confidence reporting
Ensemble and ZAI-backed runs now return richer breakdowns in details, including:
- criterion confidence
- pairwise confidence
- disagreement scores
- per-model breakdowns
- weighted model metadata
Example
{
"backend": "pi-model-ensemble",
"task": "Choose the strongest patch for the bug fix.",
"models": [
"openai:gpt-5.4",
"google:gemini-2.5-flash",
"minimax:MiniMax-M2.7-highspeed"
],
"modelWeights": [
{ "model": "openai:gpt-5.4", "weight": 1.3 },
{ "model": "google:gemini-2.5-flash", "weight": 1.0 },
{ "model": "minimax:MiniMax-M2.7-highspeed", "weight": 0.9 }
],
"candidates": [
{
"id": "patch-a",
"content": "..."
},
{
"id": "patch-b",
"content": "..."
}
],
"criteria": [
{
"name": "Correctness",
"description": "Check whether the patch directly fixes the requested behavior."
},
{
"name": "Requirements adherence",
"description": "Check whether exact task constraints are satisfied."
},
{
"name": "Empirical verification",
"description": "Check whether the candidate is supported by concrete test or runtime evidence."
}
]
}
Prompt templates
This package also ships prompt templates:
/compare-patches/audit-candidate/ensemble-verifier
These expand into ready-made instructions for common verifier workflows.
Auth and setup
Gemini Python backend
Install:
pip install google-genai
Provide one of:
GEMINI_API_KEYGOOGLE_API_KEYVERTEX_API_KEY
Pi registry backends
For zai-coding-plan and pi-model-ensemble, configure model auth in Pi for whichever providers you want to use.
Smoke tests
Python-runner smoke test:
/lav-smoke
Weighted ensemble smoke test:
/lav-ensemble-smoke
Package contents
.pi/extensions/llm-as-verifier/index.ts.agents/skills/llm-as-verifier/SKILL.md.agents/skills/llm-as-verifier/scripts/lav_runner.py.agents/skills/llm-as-verifier/examples/code-patch-selection.json.agents/skills/llm-as-verifier/examples/weighted-ensemble-selection.jsonprompts/*.md- bundled references and examples