pi-simple-voice
Streaming, verbatim Kokoro TTS for the Pi agent — speaks the assistant's output as it streams, no summarization.
Package details
Install pi-simple-voice from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-simple-voice- Package
pi-simple-voice- Version
1.0.0- Published
- Jun 13, 2026
- Downloads
- not available
- Author
- grrowl
- License
- MIT
- Types
- extension
- Size
- 81.9 KB
- Dependencies
- 2 dependencies · 3 peers
Pi manifest JSON
{
"extensions": [
"./extensions/index.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-simple-voice
Give your Pi agent a voice — streaming and verbatim.
pi-simple-voice is a text-to-speech package for the Pi coding agent. It speaks the assistant's output as it streams, sentence by sentence, with no summarization — you hear exactly what the agent writes. It runs a local HTTP server powered by Kokoro ONNX and exposes a /voice settings UI.
This is a fork of s1m0n38/pi-voice (MIT). See What's different below. Huge thanks to the upstream author — the Kokoro server,
/voiceTUI, and model management started there.
How it works: A small local server loads a single Kokoro ONNX model into memory and exposes a REST API for synthesis. The pi extension talks to it over HTTP — it never loads the model itself. The extension spawns the server on demand; the server self-exits after an idle timeout (default 15 min) and is re-spawned when next needed, so one model lives in RAM and is shared across all your pi sessions.
What's different from upstream
upstream pi-voice |
this fork | |
|---|---|---|
| Speech text | LLM summarizes each response | verbatim — speaks what the agent writes |
| Timing | one utterance at agent_end |
streaming, on sentence boundaries as tokens arrive |
| Reasoning | — | thinking/reasoning content is never voiced |
| Interrupt | agent_end (could cut the final sentence) |
turn_start / abort (clean) |
tts tool |
agent can call it | removed — speech is a side-effect of output, not a tool |
| Server lifetime | started/stopped via CLI | extension-managed; self-exits when idle, re-spawned on demand |
| Runtime | @mariozechner/* |
@earendil-works/* |
Installation
pi install npm:pi-simple-voice
The extension auto-spawns the server and downloads the default q4 model (~291 MB) the first time speech is enabled. The pi-simple-voice CLI is also available for manual control.
Requires
bunonPATH— the extension spawns the server under bun.
Usage
/voice command
Open the interactive settings UI inside Pi:
| Setting | Controls | Keys |
|---|---|---|
| TTS | Enable/disable speech | ← → |
| Voice | Speaker voice (with language/gender hints) | ← → |
| Speed | Speech rate (0.5×–3.0×) | ← → |
| Model | Quantization dtype | ← → (loads on close) |
Navigate with ↑ ↓, change a value with ← →, Enter plays a sample, s saves the current selection as the default, r resets, Esc closes. Toggle speech quickly with alt+v. The ♪ status bar shows live download/load progress (e.g. ♪ ↓ q4 25%). The Voice row appears once a model is loaded and its voices are fetched.
Settings persist in ~/.pi/voice/config.json.
Configuration
{
"enabled": true,
"voice": "af_heart",
"speed": 1.0,
"host": "127.0.0.1",
"port": 8181,
"dtype": "q4",
"idleMs": 900000
}
idleMs is how long the server sits idle before self-exiting (default 15 min). There is no summarization model and no per-event prompt config — speech is always the assistant's verbatim output.
CLI Reference
pi-simple-voice server status # show server status
pi-simple-voice server start # start server, load default model
pi-simple-voice server stop # stop server (exits the process)
pi-simple-voice server restart # restart
pi-simple-voice model list # list dtypes + download status
pi-simple-voice model load <dtype> # load (downloads if needed)
pi-simple-voice model unload # unload the active model
pi-simple-voice model download <dtype> # download without loading
pi-simple-voice model remove <dtype> # unload (if active) + delete cached files
Options: --host <host> --port <port> (defaults 127.0.0.1:8181).
Model dtypes
| Dtype | Size | Quality | Notes |
|---|---|---|---|
q4 |
~291 MB | Good | 4-bit matmul — recommended default |
q4f16 |
~147 MB | Good | 4-bit matmul + fp16 weights |
q8 |
~88 MB | Great | 8-bit quantized — best quality/size ratio |
fp16 |
~156 MB | Excellent | Half-precision floats |
fp32 |
~310 MB | Best | Full-precision floats |
Only one model is loaded at a time. Files are cached at ~/.pi/voice/cache/.
API
The server exposes HTTP endpoints at http://127.0.0.1:8181:
| Method | Path | Description |
|---|---|---|
| GET | /health |
Status, active + last dtype, model loaded, download/load progress |
| GET | /voices |
Available voice names |
| GET | /models |
All dtypes with download status |
| POST | /models/download |
Download (+ optionally activate) a dtype |
| POST | /models/delete |
Delete cached model files |
| POST | /models/activate |
Load a downloaded model |
| POST | /models/unload |
Unload model, free memory |
| POST | /tts |
Synthesize text → WAV audio |
| POST | /shutdown |
Graceful shutdown |
Events
The extension emits voice:config on the pi event bus (pi.events) whenever a setting changes via /voice:
pi.events.on("voice:config", ({ enabled, voice, speed }) => {
// update status bar, toggle features, etc.
});
License
MIT — same as upstream. Forked from s1m0n38/pi-voice.