pi-simple-voice

Streaming, verbatim Kokoro TTS for the Pi agent — speaks the assistant's output as it streams, no summarization.

Packages

Package details

extension

Install pi-simple-voice from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-simple-voice

Package: pi-simple-voice
Version: 1.0.0
Published: Jun 13, 2026
Downloads: not available
Author: grrowl
License: MIT
Types: extension
Size: 81.9 KB
Dependencies: 2 dependencies · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-simple-voice

Give your Pi agent a voice — streaming and verbatim.

pi-simple-voice is a text-to-speech package for the Pi coding agent. It speaks the assistant's output as it streams, sentence by sentence, with no summarization — you hear exactly what the agent writes. It runs a local HTTP server powered by Kokoro ONNX and exposes a /voice settings UI.

This is a fork of s1m0n38/pi-voice (MIT). See What's different below. Huge thanks to the upstream author — the Kokoro server, /voice TUI, and model management started there.

How it works: A small local server loads a single Kokoro ONNX model into memory and exposes a REST API for synthesis. The pi extension talks to it over HTTP — it never loads the model itself. The extension spawns the server on demand; the server self-exits after an idle timeout (default 15 min) and is re-spawned when next needed, so one model lives in RAM and is shared across all your pi sessions.

What's different from upstream

	upstream `pi-voice`	this fork
Speech text	LLM summarizes each response	verbatim — speaks what the agent writes
Timing	one utterance at `agent_end`	streaming, on sentence boundaries as tokens arrive
Reasoning	—	thinking/reasoning content is never voiced
Interrupt	`agent_end` (could cut the final sentence)	`turn_start` / `abort` (clean)
`tts` tool	agent can call it	removed — speech is a side-effect of output, not a tool
Server lifetime	started/stopped via CLI	extension-managed; self-exits when idle, re-spawned on demand
Runtime	`@mariozechner/*`	`@earendil-works/*`

Installation

pi install npm:pi-simple-voice

The extension auto-spawns the server and downloads the default q4 model (~291 MB) the first time speech is enabled. The pi-simple-voice CLI is also available for manual control.

Requires bun on PATH — the extension spawns the server under bun.

Usage

`/voice` command

Open the interactive settings UI inside Pi:

Setting	Controls	Keys
TTS	Enable/disable speech	← →
Voice	Speaker voice (with language/gender hints)	← →
Speed	Speech rate (0.5×–3.0×)	← →
Model	Quantization dtype	← → (loads on close)

Navigate with ↑ ↓, change a value with ← →, Enter plays a sample, s saves the current selection as the default, r resets, Esc closes. Toggle speech quickly with alt+v. The ♪ status bar shows live download/load progress (e.g. ♪ ↓ q4 25%). The Voice row appears once a model is loaded and its voices are fetched.

Settings persist in ~/.pi/voice/config.json.

Configuration

{
  "enabled": true,
  "voice": "af_heart",
  "speed": 1.0,
  "host": "127.0.0.1",
  "port": 8181,
  "dtype": "q4",
  "idleMs": 900000
}

idleMs is how long the server sits idle before self-exiting (default 15 min). There is no summarization model and no per-event prompt config — speech is always the assistant's verbatim output.

CLI Reference

pi-simple-voice server status                # show server status
pi-simple-voice server start                 # start server, load default model
pi-simple-voice server stop                  # stop server (exits the process)
pi-simple-voice server restart               # restart
pi-simple-voice model list                   # list dtypes + download status
pi-simple-voice model load <dtype>           # load (downloads if needed)
pi-simple-voice model unload                 # unload the active model
pi-simple-voice model download <dtype>       # download without loading
pi-simple-voice model remove <dtype>         # unload (if active) + delete cached files

Options: --host <host> --port <port> (defaults 127.0.0.1:8181).

Model dtypes

Dtype	Size	Quality	Notes
`q4`	~291 MB	Good	4-bit matmul — recommended default
`q4f16`	~147 MB	Good	4-bit matmul + fp16 weights
`q8`	~88 MB	Great	8-bit quantized — best quality/size ratio
`fp16`	~156 MB	Excellent	Half-precision floats
`fp32`	~310 MB	Best	Full-precision floats

Only one model is loaded at a time. Files are cached at ~/.pi/voice/cache/.

API

The server exposes HTTP endpoints at http://127.0.0.1:8181:

Method	Path	Description
GET	`/health`	Status, active + last dtype, model loaded, download/load progress
GET	`/voices`	Available voice names
GET	`/models`	All dtypes with download status
POST	`/models/download`	Download (+ optionally activate) a dtype
POST	`/models/delete`	Delete cached model files
POST	`/models/activate`	Load a downloaded model
POST	`/models/unload`	Unload model, free memory
POST	`/tts`	Synthesize text → WAV audio
POST	`/shutdown`	Graceful shutdown

Events

The extension emits voice:config on the pi event bus (pi.events) whenever a setting changes via /voice:

pi.events.on("voice:config", ({ enabled, voice, speed }) => {
  // update status bar, toggle features, etc.
});

License

MIT — same as upstream. Forked from s1m0n38/pi-voice.

pi-simple-voice

What's different from upstream

Installation

Usage

/voice command

Configuration

CLI Reference

Model dtypes

API

Events

License

`/voice` command