pi-extension-stt

Pi extension package that adds local microphone speech-to-text via faster-whisper.

Package details

← Back

extension

Install pi-extension-stt from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-extension-stt

Package: pi-extension-stt
Version: 0.1.0-beta.1
Published: Mar 28, 2026
Downloads: 37/mo · 10/wk
Author: zerone0x
License: MIT
Types: extension
Size: 118.9 KB
Dependencies: 0 dependencies · 1 peer

Pi manifest JSON

{
  "extensions": [
    "./dist/extension/index.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-extension-stt

Local-first, privacy-first speech-to-text for Pi.

This package adds a small set of slash commands that let you capture microphone audio, transcribe it locally with faster-whisper, and insert the final text into Pi's input editor.

This is intentionally not a desktop dictation app. v1 stays inside Pi:

voice -> text -> insert into Pi editor

The npm package installs the Pi extension itself. It does not install your local Python runtime, ffmpeg, PortAudio, Python modules, or microphone permissions for you.

The current beta path is optimized for macOS, especially Homebrew + Python venv.

Status

What works today:

local microphone capture through a Python bridge
local transcription through faster-whisper
Pi slash commands for start, stop, cancel, and status
transcript insertion into Pi's input editor
preflight checks for Python, ffmpeg, and Python module availability

What is intentionally out of scope in v1:

TTS
cloud STT providers
system-wide paste into other apps
global hotkeys
desktop UI outside Pi
multi-backend support
automatic message sending after transcription

Beta Quick Start

This is the shortest path for a real user on macOS with Homebrew available:

Install the Pi extension:

pi install npm:pi-extension-stt

Launch Pi:

pi

Inside Pi, run:

/stt-bootstrap full

This creates or refreshes a dedicated bridge virtualenv for you, installs ffmpeg and portaudio via Homebrew, and then prints a relaunch command with PI_STT_PYTHON=....

If ffmpeg and portaudio are already available, /stt-bootstrap without full is enough.

Relaunch Pi with the generated command. A typical result looks like this:

PI_STT_PYTHON=$HOME/.venvs/pi-stt/bin/python \
PI_STT_MODEL=tiny \
pi

Inside the relaunched Pi session, run:

/stt-setup

After setup succeeds, use:

Ctrl+Alt+M

If you prefer not to prepare the model during setup, choose the quick path. If you want the smoothest first recording, choose the full path.

Requirements

You need these local dependencies:

python3
ffmpeg
Python packages from src/bridge/requirements.txt

On macOS, a typical setup is:

brew install ffmpeg portaudio
python3 -m pip install \
  "faster-whisper>=1.0.0" \
  "numpy>=1.26.0" \
  "sounddevice>=0.4.7" \
  "socksio>=1.0.0"

If you prefer an isolated Python environment for the bridge:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -r src/bridge/requirements.txt

Install

For npm users, the normal extension install flow is:

pi install npm:pi-extension-stt

That only installs the Pi package. You still need the local dependencies from the sections above. On macOS beta setups, the normal first-run path is /stt-bootstrap, then a Pi relaunch, then /stt-setup.

From the package directory:

npm install
npm run build
pi install .

Or load it directly while developing:

npm run build
pi -e ./dist/extension/index.js

Commands

After the extension is loaded, these commands are available:

/stt-toggle
/stt-bootstrap
/stt-launch-command
/stt-setup
/stt-start
/stt-stop
/stt-cancel
/stt-status
/stt-prepare
/stt-devices
/stt-device

Recommended First Run

For the smoothest setup, use this order:

Run /stt-bootstrap.
Relaunch Pi with the generated PI_STT_PYTHON=... pi command.
Run /stt-setup.
Pick Quick check or Full setup.
If needed, run /stt-device afterward to pin a specific microphone.
Press Ctrl+Alt+M or run /stt-toggle to start, speak, then press it again to stop.

Shortcut

Ctrl+Alt+M
- idle or error: start STT
- starting: cancel startup
- listening: stop and insert transcript

If you prefer commands, /stt-toggle follows the same behavior.

`/stt-start`

Runs preflight checks, starts the local bridge, and begins listening on the default microphone.

`/stt-stop`

Stops listening, finalizes queued transcription, and inserts the transcript into Pi's current input editor. It does not auto-send the message.

`/stt-cancel`

Stops listening and discards the current transcript buffer.

`/stt-status`

Shows the current STT state, model, language, selected device, and transcript summary.

`/stt-toggle`

Single-command STT toggle. It starts recording when idle, stops and inserts the transcript when listening, and cancels startup while the bridge is still preparing.

`/stt-setup`

Guided first-run setup. This is the normal user-facing entry point.

quick
- checks Python, ffmpeg, microphones, and the selected input path
- skips model download
full
- does the same checks
- also prepares the local model cache
device
- opens the microphone picker directly

Examples:

/stt-setup
/stt-setup quick
/stt-setup full
/stt-setup device

`/stt-bootstrap`

Guided local bootstrap for beta users on macOS.

python
- creates or refreshes a dedicated virtualenv
- installs bridge Python dependencies from src/bridge/requirements.txt
- leaves Homebrew packages unchanged
full
- does the same Python bootstrap
- also runs brew install ffmpeg portaudio

Examples:

/stt-bootstrap
/stt-bootstrap python
/stt-bootstrap full
/stt-bootstrap ~/.venvs/pi-stt

Notes:

this command does not mutate the current Pi process environment in-place
full assumes Homebrew is already installed
if the generated Python path differs from the current PI_STT_PYTHON, you need to relaunch Pi with the printed command
after relaunch, run /stt-setup

`/stt-launch-command`

Shows the most recent relaunch command generated by /stt-bootstrap.

Use this when you missed the original success notification but still need the exact PI_STT_PYTHON=... pi command.

`/stt-prepare`

Pre-downloads and validates the configured faster-whisper model before you start recording. This is the best way to avoid a long first /stt-start.

`/stt-devices`

Lists the currently detected input devices and marks the default microphone.

`/stt-device`

Opens an interactive picker for the active STT input device. You can also pass an explicit device id or partial device name. The extension verifies that the selected device can actually be opened before saving it:

/stt-device 0
/stt-device MacBook
/stt-device clear

Environment Variables

Minimal configuration is done through environment variables:

PI_STT_PYTHON
PI_STT_BOOTSTRAP_VENV
PI_STT_MODEL
PI_STT_LANGUAGE
PI_STT_DEVICE
PI_STT_SILENCE_MS
PI_STT_BLOCK_MS
PI_STT_MIN_SPEECH_MS
PI_STT_PRE_ROLL_MS
PI_STT_STOP_GRACE_MS
PI_STT_SPEECH_THRESHOLD
PI_STT_PREPARE_TIMEOUT_MS
PI_STT_START_TIMEOUT_MS

Defaults:

Python: python3
model: base
language: auto-detect
silence window: 1200ms
block size: 200ms
minimum speech window: 200ms
pre-roll: 300ms
stop grace window: 300ms
speech threshold: 0.006
prepare timeout: 300000ms
startup timeout: 30000ms

Notes:

PI_STT_MODEL can be a faster-whisper model name like tiny, base, or a local model directory path
PI_STT_PYTHON is useful when the bridge runs from a project-local virtualenv, for example PI_STT_PYTHON=/path/to/pi-extension-stt/.venv/bin/python
PI_STT_BOOTSTRAP_VENV changes the default target path used by /stt-bootstrap
the recommended beta path is a dedicated virtualenv, for example PI_STT_PYTHON=$HOME/.venvs/pi-stt/bin/python
if direct access to Hugging Face is blocked, you can set HF_ENDPOINT=https://hf-mirror.com before running Pi
on proxy-heavy setups, model downloads may work better after unsetting http_proxy, https_proxy, and all_proxy
if STT hears the microphone but still misses speech, try a slightly lower gate, for example PI_STT_SPEECH_THRESHOLD=0.004

Failure Modes

Common failures and how they surface:

missing python3: /stt-start reports that Python is unavailable
missing ffmpeg: /stt-start reports that ffmpeg is unavailable
missing Python modules: /stt-start points you to pip install -r src/bridge/requirements.txt
first-time setup confusion: /stt-setup now gives a guided path and clear next steps
first-time local environment setup: /stt-bootstrap can create the bridge virtualenv and print the exact relaunch command for Pi
missing Homebrew in /stt-bootstrap full: install Homebrew first, or use /stt-bootstrap plus manual system dependencies
slow or blocked model download: /stt-prepare or /stt-start tells you to try HF_ENDPOINT=https://hf-mirror.com or a local model path
microphone permission or device problems: the bridge reports a startup error
bad device choice: /stt-devices shows available inputs and /stt-device clear resets back to the system default microphone
quiet speech or quick stop: the bridge now shows the observed mic level and suggested next steps; if needed, lower PI_STT_SPEECH_THRESHOLD or wait a fraction longer before stopping

UX Notes

/stt-bootstrap is the normal first-run automation path on macOS beta setups.
/stt-setup is the guided diagnostic and model-prep path after bootstrap.
/stt-launch-command replays the exact relaunch command if you missed the original bootstrap notification.
While STT is starting or listening, the extension adds a small widget above the Pi editor with current state and next-step hints.
The widget also stays visible while idle, so you always have a compact “start dictation” affordance in-session, plus bootstrap/setup guidance if the local environment is still incomplete.
While listening, the widget now shows a live mic/gate hint so you can tell whether the bridge is hearing enough signal to start a segment.
/stt-stop inserts the final transcript into the current Pi editor buffer and never auto-sends the message.
/stt-cancel can abort both active listening and long startup/model-prepare steps.

Development

Type-check and build:

npm run check
npm run build

Run the lightweight tests:

npm test