pi-whisper-voice

Minimal hold-SPACE voice input for Pi using an OpenAI-compatible Whisper/STT endpoint.

Package details

← Back

extension

Install pi-whisper-voice from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-whisper-voice

Package: pi-whisper-voice
Version: 0.2.0
Published: Apr 27, 2026
Downloads: 269/mo · 269/wk
Author: kengbailey
License: MIT
Types: extension
Size: 59.9 KB
Dependencies: 0 dependencies · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-whisper-voice

Minimal hold-SPACE voice input for Pi using an OpenAI-compatible Whisper/STT endpoint.

Hold SPACE to record, release to transcribe, and the transcript is inserted into Pi's editor for review. It does not auto-send the message; edit the text and submit manually when ready.

Features

Hold SPACE push-to-talk inside Pi
Local microphone capture via ffmpeg
OpenAI-compatible STT endpoint: POST /v1/audio/transcriptions
In-TUI settings for server URL, model, and token
Transcript inserted into the editor for review/editing
Persistent footer state: 🎤 ready, 🎤 recording, 🎤 transcribing
No cloud-provider lock-in
No fallback shortcut or global daemon

Usage

Start Pi. If the terminal supports Kitty keyboard protocol, the footer should show:

🎤 ready

Then:

Hold SPACE until recording starts.
Speak.
Release SPACE.
Wait for 🎤 transcribing to finish.
Review/edit the transcript inserted in the editor.
Send manually when ready.

Toggle voice input:

/voice

Configure the STT server URL, model name, and token:

/voice-settings

Alias:

/voice settings

Show the active configuration:

/voice status

Settings are saved under piWhisperVoice in global Pi settings JSON (~/.pi/agent/settings.json). Environment variables can override saved values:

PI_VOICE_STT_BASE_URL
PI_VOICE_STT_MODEL
PI_VOICE_STT_TOKEN

Project-local voice settings are ignored for safety so a repository cannot redirect microphone audio or supply a token.

Current requirements

Pi coding agent
A terminal/session with Kitty keyboard protocol key-release support
ffmpeg installed and microphone permission granted
An OpenAI-compatible transcription server

Example STT endpoint shape:

POST http://localhost:8000/v1/audio/transcriptions
Authorization: Bearer dummy
Content-Type: multipart/form-data

Response:

{ "text": "transcribed text" }

Install

Install from npm:

pi install npm:pi-whisper-voice

Or test without installing:

pi -e npm:pi-whisper-voice

Install from GitHub:

pi install git:github.com/kengbailey/pi-whisper-voice

Local development install

This repository can also be loaded directly from disk:

pi -e /path/to/pi-whisper-voice

For global auto-discovery during local development, place it at:

~/.pi/agent/extensions/pi-whisper-voice/

License

MIT