@vtstech/pi-model-test

Model benchmark/testing extension for Pi Coding Agent

Package details

← Back

extension

Install @vtstech/pi-model-test from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@vtstech/pi-model-test

Package: @vtstech/pi-model-test
Version: 1.2.1
Published: May 5, 2026
Downloads: 2,396/mo · 27/wk
Author: vtstech
License: MIT
Types: extension
Size: 51.4 KB
Dependencies: 1 dependency · 1 peer

Pi manifest JSON

{
  "extensions": [
    "./model-test.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

@vtstech/pi-model-test

Model benchmark extension for the Pi Coding Agent.

Test any model for reasoning, tool usage, and instruction following — works with Ollama and cloud providers.

Install

pi install "npm:@vtstech/pi-model-test"

Commands

/model-test                     Test current Pi model (auto-detects provider)
/model-test qwen3:0.6b          Test a specific Ollama model
/model-test --all               Test every Ollama model

Test Suites

Ollama (6 tests)

Test	Scoring
Reasoning (snail puzzle)	STRONG / MODERATE / WEAK / FAIL
Thinking token support	SUPPORTED / NOT SUPPORTED
Tool usage (native + text)	STRONG / MODERATE / WEAK / FAIL
ReAct parsing	STRONG / MODERATE / WEAK / FAIL
Instruction following (JSON)	STRONG / MODERATE / WEAK / FAIL
Tool support detection	NATIVE / REACT / NONE

Cloud Providers (4 tests)

Test	Scoring
Connectivity	OK / FAIL
Reasoning	STRONG / MODERATE / WEAK / FAIL
Instruction following	STRONG / MODERATE / WEAK / FAIL
Tool usage (function calling)	STRONG / MODERATE / WEAK / FAIL

Features

Auto-detects Ollama vs cloud provider (OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, Mistral, xAI, Together, Fireworks, Cohere)
Uses native fetch() for all HTTP communication (no shell subprocess or curl dependency)
Streaming Ollama chat — uses /api/chat with stream: true for earlier timeout detection and reduced memory
Automatic remote Ollama URL resolution (reads from models.json on every call — picks up config changes immediately)
Timeout resilience with exponential backoff retry on connection failures
Configurable test parameters — override timeouts, delays, temperature via ~/.pi/agent/model-test-config.json
Test history with regression detection — tracks results at ~/.pi/agent/cache/model-test-history.json, flags score degradation
Rate limit delay between tests (configurable)
Thinking model fallback (retries with think: true)
Tool support cache (~/.pi/agent/cache/tool_support.json)
JSON repair for truncated output (stack-based nesting-aware parser)
Tab-completion for model names

License

MIT — VTSTech

@vtstech/pi-model-test

Install

Commands

Test Suites

Ollama (6 tests)

Cloud Providers (4 tests)

Features

Links

License