pi-onnx

Run Hugging Face onnx-community models locally inside pi: registers a chat provider for ONNX text-generation models and a set of tools (embeddings, classification, ASR) backed by @huggingface/transformers and onnxruntime-node.

Packages

Package details

extension

Install pi-onnx from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-onnx
Package
pi-onnx
Version
0.1.0
Published
May 15, 2026
Downloads
not available
Author
jarkkojs
License
MIT
Types
extension
Size
41.7 KB
Dependencies
1 dependency · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./src/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-onnx

Runs Hugging Face onnx-community models locally inside the pi coding agent using @huggingface/transformers. the Pi coding agent using @huggingface/transformers.

Implements a chat provider and several tool calls:

  • onnx_embed({ texts: string[] }): array of vectors (and dimensionality).
  • onnx_classify({ text, labels? }): top-K labels with scores; when labels is provided, runs zero-shot classification.
  • onnx_transcribe({ path, language?, task? }): transcript text and segments.

Install

pi install npm:pi-onnx

Configure

Copy example-config.json from this package as a starting point:

cp example-config.json ~/.pi/agent/pi-onnx.json

Top-level

Field Type Default Notes
cacheDir string | null null (HF default) Forwarded to env.cacheDir.
device "cpu" | "webgpu" | "wasm" | "gpu" "cpu" onnxruntime execution provider hint.
defaultDtype Dtype "q4" Per-model dtype overrides this.
models ModelEntry[] [Qwen2.5-Coder-0.5B-Instruct] Each entry becomes a onnx-community/<id> chat model.
discovery object enabled, limit 50 Append popular onnx-community/* models from the HF Hub.
tools object embed only Toggles for onnx_embed / _classify / _transcribe.

Dtype is one of "fp32", "fp16", "q8", "int8", "uint8", "q4", "bnb4", "q4f16".

models[]

Field Type Default Notes
id string Hugging Face repo path (onnx-community/ prefixed).
name string id Display name shown in the model picker.
contextWindow number Context window size in tokens.
maxTokens number 1024 Default max_new_tokens for completions.
dtype Dtype defaultDtype Quantization for this model only.

Only id is required; the onnx-community/ prefix is added if missing.

Example:

{
  "id": "onnx-community/Qwen3-0.6B-ONNX",
  "name": "Qwen3-0.6B (ONNX, q4)",
  "contextWindow": 32768,
  "maxTokens": 2048,
  "dtype": "q4"
}

discovery

Field Type Default Notes
enabled boolean true Append discovered models to models[].
limit number 50 Per pipeline tag.
pipelineTags PipelineTag[] ["text-generation", "image-text-to-text", "any-to-any"] Hugging Face pipeline tags to scan.

tools.embed

Field Type Default Notes
enabled boolean true Toggles onnx_embed.
model string onnx-community/all-MiniLM-L6-v2 Any feature-extraction model.
pooling "mean" | "cls" | "none" "mean" Pooling strategy.
normalize boolean true L2-normalize output vectors.

tools.classify

Field Type Default Notes
enabled boolean false Toggles onnx_classify.
model string onnx-community/distilbert-base-uncased-finetuned-sst-2-english Classifier or NLI model (zero-shot).
topK number 5 Maximum labels returned.

tools.transcribe

Field Type Default Notes
enabled boolean false Toggles onnx_transcribe.
model string onnx-community/whisper-tiny Any ASR model.
language string | null null Default language hint (e.g. "en").
task "transcribe" | "translate" "transcribe" Default ASR task.

Limitations

  • No tool calling support for ONNX chat models.
  • Tokens are approximated from the tokenizer.
  • First call to a model blocks while weights download.
  • onnx_transcribe shells out to ffmpeg (must be on PATH) to decode the input audio file to a Float32Array before inference.

License

pi-onnx is licensed under MIT. See LICENSE for more information.