@khimaros/pi-omni

realtime voice chat with pi.dev

Packages

Package details

extension

Install @khimaros/pi-omni from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@khimaros/pi-omni
Package
@khimaros/pi-omni
Version
0.21.0
Published
Jun 18, 2026
Downloads
2,795/mo · 170/wk
Author
khimaros
License
GPL-3.0-or-later
Types
extension
Size
1.6 MB
Dependencies
6 dependencies · 1 peer
Pi manifest JSON
{
  "extensions": [
    "src/extension/index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-omni

standalone web voice server for pi.dev: a browser-based voice chat UI backed by a real pi-coding-agent session, wired to a local OpenAI-compatible STT/LLM/TTS stack (e.g. llama-swap serving whisper.cpp + llama.cpp + a TTS model).

speak in the browser; pi-omni transcribes (STT), runs the turn through pi's agent loop and tools (LLM), and speaks the reply back (TTS). each browser tab gets its own pi session.

in-TUI voice (push-to-talk and continuous voice rendered inside the pi terminal, no browser) lives in the separate pi-live package.

getting started

prerequisites:

  • node.js 20+ (the server compiles to dist/ via tsc; the extension stays .ts)
  • a working pi installation
  • an OpenAI-compatible endpoint exposing STT, LLM, and TTS

install as a pi extension:

pi install npm:@khimaros/pi-omni

control the web voice server from the pi tui:

> /omni              # interactive picker (start / status / stop / open)
> /omni start        # start the server (optionally: /omni start <host:port>)
> /omni status       # show the listen address
> /omni stop         # stop the server
> /omni open         # open the web voice app in a browser

or auto-start when pi launches (owned by pi, terminated when pi exits):

pi --omni                       # start on launch
pi --omni-listen 0.0.0.0:4962   # custom bind address (implies --omni)

run the server standalone (no pi tui, outlives pi). pi-coding-agent is an optional peer (the pi tui already provides it), so install it alongside for the standalone path, or make install to put the pi-omni command on PATH:

npm install -g @earendil-works/pi-coding-agent @khimaros/pi-omni
PI_VOICE_LLM_MODEL=qwen3-32b pi-omni --listen 127.0.0.1:4962

the server prints base_url=http://host:port once listening; open that URL.

pwa installation

the web UI is a Progressive Web App (PWA). open the URL in a supported browser and choose "Install" / "Add to Home Screen" for a native-like experience.

from a source checkout

make            # install deps + compile the server to dist/ (tsc)
make start      # run the server
make test       # build + run tests
make test-integration  # black-box python e2e against fake-openai
make wasm       # rebuild wasm/apm (after touching wasm/apm/src/)

configuration

the OpenAI-compatible endpoint and STT/TTS models come from env vars (overriding the saved ~/.pi/extensions/omni.json):

variable default purpose
PI_VOICE_BASE_URL http://localhost:8080/v1 OpenAI-compatible endpoint (STT + TTS)
PI_VOICE_API_KEY sk-no-key llama-swap usually ignores it
PI_VOICE_STT_MODEL whisper-1 as exposed by your server
PI_VOICE_TTS_MODEL tts-1
PI_VOICE_TTS_VOICE alloy
PI_VOICE_LLM_MODEL (none) default model for the standalone server
PI_OMNI_LISTEN 127.0.0.1:4962 http bind address (host:port, :port, or port)

the LLM runs through pi-coding-agent (and any installed pi extensions), so it uses pi's configured providers/models; the OpenAI-compatible endpoint above is used only for STT and TTS.

cli flags:

flag purpose
--listen <host:port> bind address; takes precedence over PI_OMNI_LISTEN
-h, --help usage

echo cancellation & barge-in

the browser client can keep the mic open during TTS and cut in on speech (barge-in), using an acoustic echo canceller -- a Rust port of WebRTC AEC3 compiled to WASM, depended on as a file: package at wasm/apm/pkg/. rebuild after touching wasm/apm/src/:

make wasm
# or directly:
cd wasm/apm && wasm-pack build --target nodejs --release

build deps: rustup (e.g. via mise use -g rust@latest) with the wasm32-unknown-unknown target, plus wasm-pack.

architecture

src/
  extension/   pi extension entry -- spawns the bin, handshakes on base_url=; .ts
  server/      HTTP + WS voice server hosting the pi sdk runtime, compiled to dist/
                 index.ts (standalone entrypoint), http.ts, session.ts,
                 turn-lifecycle.ts
  audio/       STT, TTS, VAD, AEC, sentence chunker, sanitizer (shared with
                 pi-live, which keeps its own copy)
  config.ts    shared config + env-var overrides
public/        browser client (vanilla js, no build step)
wasm/apm/      WebRTC AEC3 -> WASM (rust)
test/          node --test files (.mjs) + python e2e (fake_openai_test.py)

the server compiles to dist/ (the bin runs under plain node, which will not type-strip .ts under node_modules/); the extension stays .ts and is loaded by pi's jiti loader.

development

make            # install deps + compile the server to dist/ (tsc)
make start      # run the server
make test       # build + run tests
make test-integration  # black-box python e2e against fake-openai
make lint       # type-check (tsc --noEmit)
make precommit  # lint + test + test-integration
make install    # install the pi-omni command globally (onto PATH)
make wasm       # rebuild wasm/apm
make pack       # npm pack into build/
make publish    # npm publish --access public
make clean      # rm -rf build