@artale/pi-arena

Model benchmarking with domain-aware hallucination tracking, per-model leaderboards, and task templates. Track speed, quality, and pass rate across coding, reasoning, and general knowledge.

Package details

extension

Install @artale/pi-arena from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@artale/pi-arena
Package
@artale/pi-arena
Version
1.5.3
Published
May 2, 2026
Downloads
227/mo · 25/wk
Author
artale
License
MIT
Types
extension
Size
160.5 KB
Dependencies
0 dependencies · 0 peers
Pi manifest JSON
{
  "extensions": [
    "extensions/pi-arena.ts"
  ],
  "image": "https://raw.githubusercontent.com/artale93/pi-arena/main/preview.png"
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

@artale/pi-arena

Model benchmarking and performance tracking for pi. Run tasks against models, track results, detect regressions.

Install

npm install -g @artale/pi-arena

Features (v1.1)

  • Benchmark tracking — Record task/model/duration/score/pass runs
  • Vectara baselines — 28 models from HHEM-2.3 hallucination leaderboard (March 2026)
  • Auto-compare — Flag models scoring below their Vectara baseline
  • Verbosity penalty — Detect when verbose output correlates with hallucination
  • Domain tracking — coding, reasoning, general-knowledge, legal, medical, financial
  • Thread type tracking — Base, P-Thread, C-Thread, F-Thread, B-Thread, L-Thread, Z-Thread

Tools

  • arena_run — Record a benchmark run
  • arena_history — Query benchmark history (filter by model/domain)
  • arena_compare — Side-by-side model comparison with Vectara baselines

Commands

  • /arena stats — Aggregate statistics
  • /arena baselines — Vectara hallucination leaderboard
  • /arena leaderboard — Model rankings by score
  • /arena history [n] — Recent benchmark runs
  • /arena compare <A> <B> — Head-to-head comparison
  • /arena templates — Task templates
  • /arena export — Export all data