@vtstech/pi-model-test

Model benchmark/testing extension for Pi Coding Agent

Package details

extension

Install @vtstech/pi-model-test from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@vtstech/pi-model-test
Package
@vtstech/pi-model-test
Version
1.2.1
Published
May 5, 2026
Downloads
2,396/mo · 27/wk
Author
vtstech
License
MIT
Types
extension
Size
51.4 KB
Dependencies
1 dependency · 1 peer
Pi manifest JSON
{
  "extensions": [
    "./model-test.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

@vtstech/pi-model-test

Model benchmark extension for the Pi Coding Agent.

Test any model for reasoning, tool usage, and instruction following — works with Ollama and cloud providers.

Install

pi install "npm:@vtstech/pi-model-test"

Commands

/model-test                     Test current Pi model (auto-detects provider)
/model-test qwen3:0.6b          Test a specific Ollama model
/model-test --all               Test every Ollama model

Test Suites

Ollama (6 tests)

Test Scoring
Reasoning (snail puzzle) STRONG / MODERATE / WEAK / FAIL
Thinking token support SUPPORTED / NOT SUPPORTED
Tool usage (native + text) STRONG / MODERATE / WEAK / FAIL
ReAct parsing STRONG / MODERATE / WEAK / FAIL
Instruction following (JSON) STRONG / MODERATE / WEAK / FAIL
Tool support detection NATIVE / REACT / NONE

Cloud Providers (4 tests)

Test Scoring
Connectivity OK / FAIL
Reasoning STRONG / MODERATE / WEAK / FAIL
Instruction following STRONG / MODERATE / WEAK / FAIL
Tool usage (function calling) STRONG / MODERATE / WEAK / FAIL

Features

  • Auto-detects Ollama vs cloud provider (OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, Mistral, xAI, Together, Fireworks, Cohere)
  • Uses native fetch() for all HTTP communication (no shell subprocess or curl dependency)
  • Streaming Ollama chat — uses /api/chat with stream: true for earlier timeout detection and reduced memory
  • Automatic remote Ollama URL resolution (reads from models.json on every call — picks up config changes immediately)
  • Timeout resilience with exponential backoff retry on connection failures
  • Configurable test parameters — override timeouts, delays, temperature via ~/.pi/agent/model-test-config.json
  • Test history with regression detection — tracks results at ~/.pi/agent/cache/model-test-history.json, flags score degradation
  • Rate limit delay between tests (configurable)
  • Thinking model fallback (retries with think: true)
  • Tool support cache (~/.pi/agent/cache/tool_support.json)
  • JSON repair for truncated output (stack-based nesting-aware parser)
  • Tab-completion for model names

Links

License

MIT — VTSTech