@khimaros/pi-omni
realtime voice chat with pi.dev
Package details
Install @khimaros/pi-omni from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@khimaros/pi-omni- Package
@khimaros/pi-omni- Version
0.17.0- Published
- May 21, 2026
- Downloads
- 2,094/mo · 2,094/wk
- Author
- khimaros
- License
- GPL-3.0-or-later
- Types
- extension
- Size
- 1.7 MB
- Dependencies
- 6 dependencies · 1 peer
Pi manifest JSON
{
"extensions": [
"dist/extension/index.js"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-omni
push-to-talk voice extension for pi.dev: wires a local OpenAI-compatible STT/LLM/TTS stack (e.g. llama-swap serving whisper.cpp + llama.cpp + a TTS model) into a pi agent session, plus an optional browser UI.
getting started
prerequisites:
- node.js 20+
- a working pi installation
arecord(alsa-utils) and a wav-on-stdin speaker (aplay,paplay,ffplay -nodisp -autoexit -, …)- an OpenAI-compatible endpoint exposing STT, LLM, and TTS
install as a pi extension:
pi install npm:@khimaros/pi-omni
control voice mode from the pi tui:
> /omni # push-to-talk: tap to record, VAD or re-tap to stop
> /omni-live # continuous conversation: record → STT → LLM → TTS loop
> /omni-cancel # cancel any active recording / TTS / chat loop
> /omni-setup # configure endpoint, models, mic, speaker (re-run anytime)
> /omni-test [text] # TTS round-trip diagnostic
control the web UI from the pi tui:
> /omni-web start # start the web server
> /omni-web status # view server status
> /omni-web open # open the web UI in browser
> /omni-web stop # stop the web server
or auto-start when pi launches (terminated when pi exits):
pi --omni-live # continuous voice on launch
pi --omni-web # web server on launch
run the web server standalone (no pi tui):
PI_VOICE_LLM_MODEL=qwen3-32b npx @khimaros/pi-omni
or install globally:
npm install -g @khimaros/pi-omni
PI_VOICE_LLM_MODEL=qwen3-32b pi-omni-web
then open http://127.0.0.1:4962.
Once loaded, the Web UI automatically tracks active sessions. You can use the premium glassmorphic sessions menu in the top-right corner to view a list of recent sessions, switch between them, or start a new session instantly. Reconnecting after a WebSocket disconnect or reloading the page automatically resumes the active session based on the URL hash.
pwa installation
the web UI is a Progressive Web App (PWA). you can "install" it to your home screen or desktop for a native-like experience:
- open the URL in a supported browser (Chrome, Safari, Edge).
- look for the "Install" icon in the address bar or select "Add to Home Screen" from the browser menu.
- the app will appear on your device with a premium waveform icon.
from a source checkout
make # install deps + build
make test # run tests
make wasm # rebuild wasm/apm (after touching wasm/apm/src/)
configuration
first run of /omni triggers /omni-setup automatically — it walks
through endpoint, models, mic, speaker, and an end-to-end round-trip test.
saved to ~/.pi/extensions/omni.json. re-run /omni-setup anytime to
reconfigure.
env vars override the saved file:
| variable | default | purpose |
|---|---|---|
PI_VOICE_BASE_URL |
http://localhost:8080/v1 |
OpenAI-compatible endpoint |
PI_VOICE_API_KEY |
sk-no-key |
llama-swap usually ignores it |
PI_VOICE_STT_MODEL |
whisper-1 |
as exposed by your server |
PI_VOICE_TTS_MODEL |
tts-1 |
|
PI_VOICE_TTS_VOICE |
alloy |
|
PI_VOICE_LLM_MODEL |
(none) | required for standalone pi-omni-web |
PI_VOICE_MIC_DEVICE |
(default ALSA) | passed to arecord -D |
PI_VOICE_SPEAKER_CMD |
aplay -q ... |
reads WAV from stdin |
PI_VOICE_AEC_ENABLED |
false |
acoustic echo cancellation (WebRTC AEC3 WASM) |
PI_VOICE_AEC_DELAY_MS |
200 |
expected speaker→mic round-trip |
PI_VOICE_BARGE_IN |
false |
keep mic open during TTS, cut in on speech |
PI_VOICE_BARGE_IN_MIN_MS |
300 |
minimum speech duration to count as barge-in |
PI_VOICE_WEB_HOST |
127.0.0.1 |
http bind address for the web server |
PI_VOICE_WEB_PORT |
4962 |
http port for the web server |
cli flags for the pi extension:
| flag | env var | config key | effect |
|---|---|---|---|
--omni-live |
PI_OMNI_AUTO_LIVE=true |
autoStartLive |
start continuous voice on launch |
--omni-web |
PI_OMNI_AUTO_WEB=true |
autoStartWeb |
start web server on launch |
cli flags for standalone pi-omni-web:
| flag | purpose |
|---|---|
--listen <host:port> |
http bind address; takes precedence over env vars |
-h, --help |
usage |
echo cancellation & barge-in
set aecEnabled: true and bargeInEnabled: true (via /omni-setup or env)
to keep the mic open during TTS so you can interrupt by speaking. without
AEC, only enable barge-in on headphones — speaker output will feed back into
the mic and the bot will interrupt itself.
the AEC is a Rust port of WebRTC AEC3 compiled to WASM, depended on as a
file: package at wasm/apm/pkg/. rebuild after touching wasm/apm/src/:
make wasm
# or directly:
cd wasm/apm && wasm-pack build --target nodejs --release
build deps: rustup (e.g. via mise use -g rust@latest) with the
wasm32-unknown-unknown target, plus wasm-pack.
roadmap
see ROADMAP.md for implemented and planned features.
architecture
src/
extension/ pi extension entry (commands, shortcuts, event handlers)
server/ HTTP + WS server hosting the browser client
bin/ standalone executables (pi-omni-web)
audio/ mic, STT, TTS, VAD, AEC, sentence chunker, sanitizer
config.ts shared config + env-var overrides
public/ browser client (no build step)
wasm/apm/ WebRTC AEC3 → WASM (rust)
test/ node --test files
development
make # install deps + build (tsc)
make test # build then run node --test
make lint # type-check (tsc --noEmit)
make precommit # lint + test
make install # install globally from this checkout
make update # npm update
make wasm # rebuild wasm/apm
make pack # npm pack into build/
make publish # npm publish --access public
make clean # rm -rf dist build
known limits
- sentence chunking is naive (split on
.!?\n); abbreviations like "e.g." will split early. - manual barge-in via
/omnire-tap works without AEC; automatic barge-in needs AEC enabled or headphones. - if the pi extension bus doesn't forward
message_update, TTS waits forturn_end— still works, just less interactive. - barge-in cuts off TTS instantly but the LLM keeps generating in the background until it finishes; its output is discarded.
- standalone
pi-omni-webrequiresPI_VOICE_LLM_MODEL; the pi extension path doesn't (pi owns the LLM).