pi-extension-stt

Pi extension package that adds local microphone speech-to-text via faster-whisper.

Package details

extension

Install pi-extension-stt from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-extension-stt
Package
pi-extension-stt
Version
0.1.0-beta.1
Published
Mar 28, 2026
Downloads
37/mo · 10/wk
Author
zerone0x
License
MIT
Types
extension
Size
118.9 KB
Dependencies
0 dependencies · 1 peer
Pi manifest JSON
{
  "extensions": [
    "./dist/extension/index.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-extension-stt

Local-first, privacy-first speech-to-text for Pi.

This package adds a small set of slash commands that let you capture microphone audio, transcribe it locally with faster-whisper, and insert the final text into Pi's input editor.

This is intentionally not a desktop dictation app. v1 stays inside Pi:

voice -> text -> insert into Pi editor

The npm package installs the Pi extension itself. It does not install your local Python runtime, ffmpeg, PortAudio, Python modules, or microphone permissions for you.

The current beta path is optimized for macOS, especially Homebrew + Python venv.

Status

What works today:

  • local microphone capture through a Python bridge
  • local transcription through faster-whisper
  • Pi slash commands for start, stop, cancel, and status
  • transcript insertion into Pi's input editor
  • preflight checks for Python, ffmpeg, and Python module availability

What is intentionally out of scope in v1:

  • TTS
  • cloud STT providers
  • system-wide paste into other apps
  • global hotkeys
  • desktop UI outside Pi
  • multi-backend support
  • automatic message sending after transcription

Beta Quick Start

This is the shortest path for a real user on macOS with Homebrew available:

  1. Install the Pi extension:
pi install npm:pi-extension-stt
  1. Launch Pi:
pi
  1. Inside Pi, run:
/stt-bootstrap full

This creates or refreshes a dedicated bridge virtualenv for you, installs ffmpeg and portaudio via Homebrew, and then prints a relaunch command with PI_STT_PYTHON=....

If ffmpeg and portaudio are already available, /stt-bootstrap without full is enough.

  1. Relaunch Pi with the generated command. A typical result looks like this:
PI_STT_PYTHON=$HOME/.venvs/pi-stt/bin/python \
PI_STT_MODEL=tiny \
pi
  1. Inside the relaunched Pi session, run:
/stt-setup
  1. After setup succeeds, use:
Ctrl+Alt+M

If you prefer not to prepare the model during setup, choose the quick path. If you want the smoothest first recording, choose the full path.

Requirements

You need these local dependencies:

  • python3
  • ffmpeg
  • Python packages from src/bridge/requirements.txt

On macOS, a typical setup is:

brew install ffmpeg portaudio
python3 -m pip install \
  "faster-whisper>=1.0.0" \
  "numpy>=1.26.0" \
  "sounddevice>=0.4.7" \
  "socksio>=1.0.0"

If you prefer an isolated Python environment for the bridge:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -r src/bridge/requirements.txt

Install

For npm users, the normal extension install flow is:

pi install npm:pi-extension-stt

That only installs the Pi package. You still need the local dependencies from the sections above. On macOS beta setups, the normal first-run path is /stt-bootstrap, then a Pi relaunch, then /stt-setup.

From the package directory:

npm install
npm run build
pi install .

Or load it directly while developing:

npm run build
pi -e ./dist/extension/index.js

Commands

After the extension is loaded, these commands are available:

  • /stt-toggle
  • /stt-bootstrap
  • /stt-launch-command
  • /stt-setup
  • /stt-start
  • /stt-stop
  • /stt-cancel
  • /stt-status
  • /stt-prepare
  • /stt-devices
  • /stt-device

Recommended First Run

For the smoothest setup, use this order:

  1. Run /stt-bootstrap.
  2. Relaunch Pi with the generated PI_STT_PYTHON=... pi command.
  3. Run /stt-setup.
  4. Pick Quick check or Full setup.
  5. If needed, run /stt-device afterward to pin a specific microphone.
  6. Press Ctrl+Alt+M or run /stt-toggle to start, speak, then press it again to stop.

Shortcut

  • Ctrl+Alt+M
    • idle or error: start STT
    • starting: cancel startup
    • listening: stop and insert transcript

If you prefer commands, /stt-toggle follows the same behavior.

/stt-start

Runs preflight checks, starts the local bridge, and begins listening on the default microphone.

/stt-stop

Stops listening, finalizes queued transcription, and inserts the transcript into Pi's current input editor. It does not auto-send the message.

/stt-cancel

Stops listening and discards the current transcript buffer.

/stt-status

Shows the current STT state, model, language, selected device, and transcript summary.

/stt-toggle

Single-command STT toggle. It starts recording when idle, stops and inserts the transcript when listening, and cancels startup while the bridge is still preparing.

/stt-setup

Guided first-run setup. This is the normal user-facing entry point.

  • quick
    • checks Python, ffmpeg, microphones, and the selected input path
    • skips model download
  • full
    • does the same checks
    • also prepares the local model cache
  • device
    • opens the microphone picker directly

Examples:

/stt-setup
/stt-setup quick
/stt-setup full
/stt-setup device

/stt-bootstrap

Guided local bootstrap for beta users on macOS.

  • python
    • creates or refreshes a dedicated virtualenv
    • installs bridge Python dependencies from src/bridge/requirements.txt
    • leaves Homebrew packages unchanged
  • full
    • does the same Python bootstrap
    • also runs brew install ffmpeg portaudio

Examples:

/stt-bootstrap
/stt-bootstrap python
/stt-bootstrap full
/stt-bootstrap ~/.venvs/pi-stt

Notes:

  • this command does not mutate the current Pi process environment in-place
  • full assumes Homebrew is already installed
  • if the generated Python path differs from the current PI_STT_PYTHON, you need to relaunch Pi with the printed command
  • after relaunch, run /stt-setup

/stt-launch-command

Shows the most recent relaunch command generated by /stt-bootstrap.

Use this when you missed the original success notification but still need the exact PI_STT_PYTHON=... pi command.

/stt-prepare

Pre-downloads and validates the configured faster-whisper model before you start recording. This is the best way to avoid a long first /stt-start.

/stt-devices

Lists the currently detected input devices and marks the default microphone.

/stt-device

Opens an interactive picker for the active STT input device. You can also pass an explicit device id or partial device name. The extension verifies that the selected device can actually be opened before saving it:

/stt-device 0
/stt-device MacBook
/stt-device clear

Environment Variables

Minimal configuration is done through environment variables:

  • PI_STT_PYTHON
  • PI_STT_BOOTSTRAP_VENV
  • PI_STT_MODEL
  • PI_STT_LANGUAGE
  • PI_STT_DEVICE
  • PI_STT_SILENCE_MS
  • PI_STT_BLOCK_MS
  • PI_STT_MIN_SPEECH_MS
  • PI_STT_PRE_ROLL_MS
  • PI_STT_STOP_GRACE_MS
  • PI_STT_SPEECH_THRESHOLD
  • PI_STT_PREPARE_TIMEOUT_MS
  • PI_STT_START_TIMEOUT_MS

Defaults:

  • Python: python3
  • model: base
  • language: auto-detect
  • silence window: 1200ms
  • block size: 200ms
  • minimum speech window: 200ms
  • pre-roll: 300ms
  • stop grace window: 300ms
  • speech threshold: 0.006
  • prepare timeout: 300000ms
  • startup timeout: 30000ms

Notes:

  • PI_STT_MODEL can be a faster-whisper model name like tiny, base, or a local model directory path
  • PI_STT_PYTHON is useful when the bridge runs from a project-local virtualenv, for example PI_STT_PYTHON=/path/to/pi-extension-stt/.venv/bin/python
  • PI_STT_BOOTSTRAP_VENV changes the default target path used by /stt-bootstrap
  • the recommended beta path is a dedicated virtualenv, for example PI_STT_PYTHON=$HOME/.venvs/pi-stt/bin/python
  • if direct access to Hugging Face is blocked, you can set HF_ENDPOINT=https://hf-mirror.com before running Pi
  • on proxy-heavy setups, model downloads may work better after unsetting http_proxy, https_proxy, and all_proxy
  • if STT hears the microphone but still misses speech, try a slightly lower gate, for example PI_STT_SPEECH_THRESHOLD=0.004

Failure Modes

Common failures and how they surface:

  • missing python3: /stt-start reports that Python is unavailable
  • missing ffmpeg: /stt-start reports that ffmpeg is unavailable
  • missing Python modules: /stt-start points you to pip install -r src/bridge/requirements.txt
  • first-time setup confusion: /stt-setup now gives a guided path and clear next steps
  • first-time local environment setup: /stt-bootstrap can create the bridge virtualenv and print the exact relaunch command for Pi
  • missing Homebrew in /stt-bootstrap full: install Homebrew first, or use /stt-bootstrap plus manual system dependencies
  • slow or blocked model download: /stt-prepare or /stt-start tells you to try HF_ENDPOINT=https://hf-mirror.com or a local model path
  • microphone permission or device problems: the bridge reports a startup error
  • bad device choice: /stt-devices shows available inputs and /stt-device clear resets back to the system default microphone
  • quiet speech or quick stop: the bridge now shows the observed mic level and suggested next steps; if needed, lower PI_STT_SPEECH_THRESHOLD or wait a fraction longer before stopping

UX Notes

  • /stt-bootstrap is the normal first-run automation path on macOS beta setups.
  • /stt-setup is the guided diagnostic and model-prep path after bootstrap.
  • /stt-launch-command replays the exact relaunch command if you missed the original bootstrap notification.
  • While STT is starting or listening, the extension adds a small widget above the Pi editor with current state and next-step hints.
  • The widget also stays visible while idle, so you always have a compact “start dictation” affordance in-session, plus bootstrap/setup guidance if the local environment is still incomplete.
  • While listening, the widget now shows a live mic/gate hint so you can tell whether the bridge is hearing enough signal to start a segment.
  • /stt-stop inserts the final transcript into the current Pi editor buffer and never auto-sends the message.
  • /stt-cancel can abort both active listening and long startup/model-prepare steps.

Development

Type-check and build:

npm run check
npm run build

Run the lightweight tests:

npm test