pi-onnx

Run Hugging Face onnx-community models locally inside pi: registers a chat provider for ONNX text-generation models and a set of tools (embeddings, classification, ASR) backed by @huggingface/transformers and onnxruntime-node.

Packages

Package details

extension

Install pi-onnx from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-onnx
Package
pi-onnx
Version
0.2.4
Published
May 31, 2026
Downloads
175/mo · 8/wk
Author
jarkkojs
License
MIT
Types
extension
Size
57.7 KB
Dependencies
2 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./src/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-onnx

Runs Hugging Face onnx-community models locally inside the pi coding agent using @huggingface/transformers.

Implements a chat provider and several tool calls:

  • onnx_embed({ texts: string[] }): array of vectors (and dimensionality).
  • onnx_classify({ text, labels? }): top-K labels with scores; when labels is provided, runs zero-shot classification.
  • onnx_transcribe({ path, language?, task? }): transcript text and segments.

Install

pi install npm:pi-onnx

Configure

Copy example-config.json from this package as a starting point:

cp example-config.json ~/.pi/agent/pi-onnx.json

Top-level

Field Type Default Notes
cacheDir string | null null (HF default) Forwarded to env.cacheDir.
device "cpu" | "webgpu" | "wasm" | "gpu" "cpu" onnxruntime execution provider hint.
defaultDtype Dtype "q4" Per-model dtype overrides this.
preloadDefaultModel boolean false Preload the first configured model on session start.
models ModelEntry[] [Qwen2.5-Coder-0.5B-Instruct] Each entry becomes a onnx-community/<id> chat model.
discovery object enabled, limit 50 Append compatible onnx-community/* models from the HF Hub.
tools object embed only Toggles for onnx_embed / _classify / _transcribe.

Dtype is one of "fp32", "fp16", "q8", "int8", "uint8", "q4", "bnb4", "q4f16".

models[]

Field Type Default Notes
id string Hugging Face repo path (onnx-community/ prefixed).
name string id Display name shown in the model picker.
contextWindow number Context window size in tokens.
maxTokens number 1024 Default max_new_tokens for completions.
dtype Dtype defaultDtype Quantization for this model only.

Only id is required; the onnx-community/ prefix is added if missing. Pinned models are checked against the Hugging Face Hub when possible. Repositories that are not compatible with @huggingface/transformers text generation, such as onnxruntime-genai image-text-to-text exports, are skipped instead of being offered as broken chat models.

Example:

{
  "id": "onnx-community/Qwen3-0.6B-ONNX",
  "name": "Qwen3-0.6B (ONNX, q4)",
  "contextWindow": 32768,
  "maxTokens": 2048,
  "dtype": "q4"
}

discovery

Field Type Default Notes
enabled boolean true Append discovered models to models[].
limit number 50 Per pipeline tag.
pipelineTags PipelineTag[] ["text-generation"] Hugging Face pipeline tags to scan.

Discovery only registers transformers.js text-generation repositories that expose a supported onnx/model*.onnx file. It also records the matching dtype, so models such as gpt-oss-20b-ONNX use q4f16 instead of the global default q4. If a discovered model is also pinned in models without an explicit dtype, discovery fills in that dtype automatically.

tools.embed

Field Type Default Notes
enabled boolean true Toggles onnx_embed.
model string onnx-community/all-MiniLM-L6-v2 Any feature-extraction model.
pooling "mean" | "cls" "mean" Pooling strategy.
normalize boolean true L2-normalize output vectors.

tools.classify

Field Type Default Notes
enabled boolean false Toggles onnx_classify.
model string onnx-community/distilbert-base-uncased-finetuned-sst-2-english Classifier or NLI model (zero-shot).
topK number 5 Maximum labels returned.

tools.transcribe

Field Type Default Notes
enabled boolean false Toggles onnx_transcribe.
model string onnx-community/whisper-tiny Any ASR model.
language string | null null Default language hint (e.g. "en").
task "transcribe" | "translate" "transcribe" Default ASR task.
maxDecodedBytes number 268435456 Maximum decoded f32 audio bytes to buffer in RAM.

Limitations

  • No tool calling support for ONNX chat models.
  • Tokens are approximated from the tokenizer.
  • First call to a model blocks while weights download.
  • onnx_transcribe shells out to ffmpeg (must be on PATH) to decode the input audio file to a Float32Array before inference, capped by maxDecodedBytes.

License

pi-onnx is licensed under MIT. See LICENSE for more information.