@s1m0n38/pi-voice

HTTP server for Kokoro TTS — text-to-speech via ONNX inference.

Packages

Package details

extension

Install @s1m0n38/pi-voice from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@s1m0n38/pi-voice
Package
@s1m0n38/pi-voice
Version
2.0.1
Published
May 7, 2026
Downloads
432/mo · 432/wk
Author
s1m0n38
License
MIT
Types
extension
Size
121.6 KB
Dependencies
2 dependencies · 4 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-voice

Give your Pi agent a voice.

pi-voice is a text-to-speech package for the Pi coding agent. It runs a local HTTP server powered by Kokoro ONNX and exposes a /voice settings UI, a tts tool, and automatic speech on agent responses.

How it works: The server loads a single Kokoro ONNX model into memory and exposes a REST API for synthesis. The pi extension talks to this server over HTTP — it never loads the model directly. This separation keeps the agent lightweight while the server handles the heavy ONNX inference.

Installation

pi install npm:@s1m0n38/pi-voice

The pi-voice CLI is available after install. Start the server and download the default model:

pi-voice server                      # start on 127.0.0.1:8181
pi-voice download q4                 # download + activate the q4 model (~291 MB)

Usage

/voice command

Open the interactive settings UI inside Pi:

Setting Controls Keys
TTS Enable/disable speech ← →
Voice Speaker voice (with language/gender hints) ← →
Speed Speech rate (0.5×–3.0×) ← →

Navigate with ↑ ↓, press Enter to play a sample, r to reset defaults, Esc to close.

Settings persist in ~/.pi/voice/config.json across sessions.

tts tool

The agent can speak at any time using the tts tool:

> Use the tts tool to say "Build complete, all tests passing"

Auto-TTS

Enable automatic speech after every agent response by editing ~/.pi/voice/config.json:

{
  "enabled": true,
  "voice": "af_heart",
  "speed": 1.0,
  "events": {
    "agent_end": {
      "prompt": "Summarize in one short sentence for text-to-speech.",
      "model": { "provider": "anthropic", "id": "claude-haiku-4-5" }
    },
    "turn_end": {
      "prompt": "Summarize briefly."
    },
    "custom_event": {
      "text": "Custom event triggered."
    }
  }
}

Each event key enables auto-TTS for that event. The value is one of:

Field Type Description
prompt string LLM system prompt for summarizing the event message. The event's last message is provided as context.
text string Fixed text to speak directly — no LLM call. Mutually exclusive with prompt.
model { provider, id } Optional. Model to use for summarization. If omitted, inherits the active session model.

Built-in pi events (agent_end, turn_end, message_end) use the message data from the event. Any other key is treated as a custom event on the shared pi.events bus.

CLI Reference

pi-voice server                              # start server (default: 127.0.0.1:8181)
pi-voice server --host 0.0.0.0 --port 9090   # custom host/port
pi-voice download q4                         # download + activate model dtype
pi-voice delete q4                           # delete cached model files
pi-voice status                              # show server status and active model
pi-voice voices                              # list available voices

Model dtypes

Dtype Size Quality Notes
q4 ~291 MB Good 4-bit matmul — recommended default
q4f16 ~147 MB Good 4-bit matmul + fp16 weights — smaller, good trade-off
q8 ~88 MB Great 8-bit quantized — best quality/size ratio
fp16 ~156 MB Excellent Half-precision floats
fp32 ~310 MB Best Full-precision floats — largest, highest quality

Only one model is loaded at a time. Downloading or activating a new model automatically unloads the previous one.

Model files are cached at ~/.pi/voice/cache/ and persist across npm install cycles. To reclaim disk space, use pi-voice delete <dtype>.

API

The server exposes HTTP endpoints at http://127.0.0.1:8181:

Method Path Description
GET /health Server status, active dtype, model loaded
GET /voices Available voice names
GET /models All dtypes with download status
POST /models/download Download + activate a dtype
POST /models/delete Delete cached model files
POST /models/activate Load a downloaded model
POST /models/unload Unload model, free memory
POST /tts Synthesize text → WAV audio
POST /shutdown Graceful shutdown

Events

pi-voice emits events on the pi event bus (pi.events) so other extensions can integrate with TTS activity.

Event Payload When
voice:config { enabled, voice, speed } Any setting change via /voice
voice:speak_start { text, voice, speed, source } Synthesis requested
voice:speak_end { text, source, error? } Playback done or failed

source is "tool" (LLM invoked tts), "auto" (auto-TTS handler), or "sample" (/voice preview).

// React to config changes
pi.events.on("voice:config", ({ enabled, voice, speed }) => {
  // update status bar, toggle features, etc.
});

// Track speech activity
pi.events.on("voice:speak_start", ({ text, source }) => {
  if (source === "auto") console.log(`[TTS] ${text}`);
});

pi.events.on("voice:speak_end", ({ error }) => {
  if (error) console.warn(`TTS failed: ${error}`);
});

License

MIT


Bootstrapped from pi-package-template.