@juicesharp/rpiv-voice

Pi extension. Voice dictation via /voice — local on-device STT with sherpa-onnx Whisper (base multilingual int8), microphone capture via decibri.

Packages

Package details

extension

Install @juicesharp/rpiv-voice from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@juicesharp/rpiv-voice
Package
@juicesharp/rpiv-voice
Version
1.12.0
Published
May 21, 2026
Downloads
2,639/mo · 2,110/wk
Author
juicesharp
License
MIT
Types
extension
Size
781.9 KB
Dependencies
3 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

rpiv-voice

Talk to Pi Agent instead of typing. rpiv-voice adds the /voice slash command — open the overlay, speak, hit Enter, and your transcript drops straight into Pi's editor. Speech-to-text runs entirely on your machine via sherpa-onnx Whisper (base multilingual int8). No cloud, no API keys, no telemetry.

Voice dictation overlay above the Pi editor

Features

  • 100% on-device — audio never leaves your laptop. No accounts, no API keys, no network calls after the first model download.
  • ~99 languages, autodetected — Whisper base multilingual handles the full Whisper language set with per-utterance autodetection. Switch languages mid-session without changing settings.
  • Live transcript — committed lines render as you finish phrases, with a dim rolling partial showing the still-active utterance in real time. What you see is what gets pasted (no waiting for a "proper" final).
  • VAD-driven chunking — Silero voice-activity detection breaks long monologues at natural pauses, so latency stays bounded even on a 5-minute rant.
  • Settings screen built-inTab flips to a settings panel showing your active mic, detected language, and a hallucination filter toggle. Ctrl-S to save, Esc or Tab to return to dictation.
  • Whisper hallucination filter — strips spurious "Thanks for watching", "[Music]", and repeating-token loops that Whisper sometimes emits on silence. Toggle off if you're dictating short single words.
  • Pause / resume — hit Space to mute the mic without closing the overlay; great for stepping aside mid-thought.
  • Localized UI — overlay, status bar, and settings render in German, English, Spanish, French, Portuguese (European + Brazilian), Russian, and Ukrainian when @juicesharp/rpiv-i18n is installed. Falls back to English when it isn't.
  • Honest first-run UX — the splash overlay shows download progress (percent + bytes), then Extracting…, Verifying…, Loading engine…, Initializing mic… before the dictation overlay opens. Half-loaded states never reach you.
  • Configurable cancel keybinding — bind cancel to whatever your fingers prefer; no longer hardcoded to Esc.
  • Errors persisted, not swallowed — recognition failures land in ~/.config/rpiv-voice/errors.log so you can see why a phrase didn't transcribe.

Install

rpiv-voice is opt-in — it's not part of /rpiv-setup because the native deps (sherpa-onnx, decibri) are heavyweight. Install it directly:

pi install npm:@juicesharp/rpiv-voice

Then restart your Pi session.

Optional: localized UI

Install @juicesharp/rpiv-i18n alongside it to flip the overlay, status bar, and settings strings to your active locale:

pi install npm:@juicesharp/rpiv-i18n

/languages switches the locale live — no restart.

Usage

Type /voice in Pi's input — the overlay opens with a recording glyph, a session timer, and Listening….

Key Action
(speak) Equalizer animates; transcript fills in live as Whisper decodes
Enter Close overlay, paste transcript into the Pi editor
Esc Close overlay, paste nothing (configurable — see below)
Space Pause / resume the mic
Tab Flip between dictation and settings screens
Ctrl-S (in Settings) Save settings to disk

The dim trailing text after the committed transcript is the rolling partial — it's already part of what will paste, so you can hit Enter the moment you're done.

First run

The first time you run /voice, the splash overlay downloads the Whisper base multilingual model (~198 MB compressed, ~157 MB on disk) into ~/.pi/models/whisper-base/. Subsequent runs load directly from disk in under a second. If a previous download was interrupted, the stale model directory is detected and re-downloaded automatically.

Configuration

rpiv-voice works without any config file. To customize, drop a JSON file at ~/.config/rpiv-voice/voice.json:

{
  "hallucinationFilterEnabled": false
}
Field Default Effect
hallucinationFilterEnabled true When false, keeps Whisper's "Thanks for watching" / "[Music]" / repeating-token loops. Useful when dictating short single words that the filter might mistake for noise.

You can also flip the toggle interactively from the Settings screen (Tab from dictation, Ctrl-S to save).

The microphone is the OS default input — rpiv-voice does not expose device selection. The bundled Whisper base multilingual model is loaded from ~/.pi/models/whisper-base/; alternative models aren't supported today.

Privacy

  • No cloud STT. Audio is decoded on your CPU via sherpa-onnx; nothing leaves the machine.
  • No telemetry. No usage events, no crash reports, no install pings. Errors are written to a local log only.
  • No API keys. Nothing to provision, nothing to revoke.
  • Network only on first run — to download the model. After that, /voice works offline.

Requirements

  • Pi Agent CLI
  • A working microphone reachable by decibri (mic permission granted to your terminal on macOS)
  • ~200 MB free disk under ~/.pi/models/whisper-base/
  • Network access on first run only

Troubleshooting

  • "Microphone init failed" on the splash — grant your terminal app microphone access (System Settings → Privacy & Security → Microphone on macOS), then re-run /voice.
  • Transcript looks like "Thanks for watching" — Whisper hallucinated on near-silence; either speak louder/closer, or leave the hallucination filter on (the default).
  • /voice not found — restart your Pi session after install. If it's still missing, confirm the entry exists in ~/.pi/agent/settings.json.
  • Recognition errors — check ~/.config/rpiv-voice/errors.log for the underlying sherpa-onnx error.

Related packages

License

npm version License: MIT

MIT — see LICENSE.