@vtstech/pi-model-test
Model benchmark/testing extension for Pi Coding Agent
Package details
Install @vtstech/pi-model-test from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@vtstech/pi-model-test- Package
@vtstech/pi-model-test- Version
1.2.1- Published
- May 5, 2026
- Downloads
- 2,396/mo · 27/wk
- Author
- vtstech
- License
- MIT
- Types
- extension
- Size
- 51.4 KB
- Dependencies
- 1 dependency · 1 peer
Pi manifest JSON
{
"extensions": [
"./model-test.js"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
@vtstech/pi-model-test
Model benchmark extension for the Pi Coding Agent.
Test any model for reasoning, tool usage, and instruction following — works with Ollama and cloud providers.
Install
pi install "npm:@vtstech/pi-model-test"
Commands
/model-test Test current Pi model (auto-detects provider)
/model-test qwen3:0.6b Test a specific Ollama model
/model-test --all Test every Ollama model
Test Suites
Ollama (6 tests)
| Test | Scoring |
|---|---|
| Reasoning (snail puzzle) | STRONG / MODERATE / WEAK / FAIL |
| Thinking token support | SUPPORTED / NOT SUPPORTED |
| Tool usage (native + text) | STRONG / MODERATE / WEAK / FAIL |
| ReAct parsing | STRONG / MODERATE / WEAK / FAIL |
| Instruction following (JSON) | STRONG / MODERATE / WEAK / FAIL |
| Tool support detection | NATIVE / REACT / NONE |
Cloud Providers (4 tests)
| Test | Scoring |
|---|---|
| Connectivity | OK / FAIL |
| Reasoning | STRONG / MODERATE / WEAK / FAIL |
| Instruction following | STRONG / MODERATE / WEAK / FAIL |
| Tool usage (function calling) | STRONG / MODERATE / WEAK / FAIL |
Features
- Auto-detects Ollama vs cloud provider (OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, Mistral, xAI, Together, Fireworks, Cohere)
- Uses native
fetch()for all HTTP communication (no shell subprocess or curl dependency) - Streaming Ollama chat — uses
/api/chatwithstream: truefor earlier timeout detection and reduced memory - Automatic remote Ollama URL resolution (reads from
models.jsonon every call — picks up config changes immediately) - Timeout resilience with exponential backoff retry on connection failures
- Configurable test parameters — override timeouts, delays, temperature via
~/.pi/agent/model-test-config.json - Test history with regression detection — tracks results at
~/.pi/agent/cache/model-test-history.json, flags score degradation - Rate limit delay between tests (configurable)
- Thinking model fallback (retries with
think: true) - Tool support cache (
~/.pi/agent/cache/tool_support.json) - JSON repair for truncated output (stack-based nesting-aware parser)
- Tab-completion for model names
Links
License
MIT — VTSTech