@wenjinnn/pi-mimo-voice

Voice input and output for pi powered by Xiaomi MiMo V2.5 TTS

Packages

Package details

extension

Install @wenjinnn/pi-mimo-voice from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@wenjinnn/pi-mimo-voice
Package
@wenjinnn/pi-mimo-voice
Version
1.1.1
Published
May 31, 2026
Downloads
not available
Author
wenjinnn
License
MIT
Types
extension
Size
47 KB
Dependencies
0 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./index.ts"
  ],
  "image": "https://github.com/wenjinnn/pi-mimo-voice/releases/download/v1.0.5/screenshot.png",
  "video": "https://github.com/wenjinnn/pi-mimo-voice/releases/download/v1.0.5/video.mp4"
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-mimo-voice

🇨🇳 中文文档

Voice input (STT) and output (TTS) for pi powered by Xiaomi MiMo V2.5 API.

npm version GitHub

pi-mimo-voice screenshot

🔊 Please turn on sound for the demo below

https://github.com/user-attachments/assets/9a9f9984-9476-4c4e-bbeb-ae088a7d875c

Features

  • 🎤 Speech-to-Text (STT) — Record audio from microphone and transcribe
  • 🔊 Text-to-Speech (TTS) — Speak text aloud through speakers
  • 🗣️ Auto-speak — Automatically read all assistant replies
  • 🎙️ Live mode — Continuous voice conversation loop
  • 🎛️ Interactive config — Voice, model, engine, and region settings

Installation

# Install from npm
pi install npm:@wenjinnn/pi-mimo-voice

# Or install from GitHub
pi install git:github.com/wenjinnn/pi-mimo-voice

# Or clone manually
cd ~/.pi/agent/extensions
git clone https://github.com/wenjinnn/pi-mimo-voice.git

Quick Start

# 1. Set API key (via environment variable or pi /login)
export XIAOMI_TOKEN_PLAN_CN_API_KEY="your-key"  # China region
# Or: export XIAOMI_API_KEY="your-key"          # Global

# 2. Restart pi or /reload

# 3. Try it
/speak Hello from MiMo!
/listen
/listen stop

Commands

Text-to-Speech

Command Description
/speak <text> Speak text aloud using MiMo TTS

Speech-to-Text

Command Description
/listen Start recording (manual stop with /listen stop)
/listen stop Stop recording and transcribe
/listen auto-stop N Record for N seconds then transcribe
/listen N Record for N seconds (max 60) then transcribe

Auto-speak & Live Mode

Command Description
/auto-speak Toggle auto-speak (read all assistant replies). Use /auto-speak on|off for explicit control
/live Start/stop live voice mode (auto-speak ON, begins recording)
/live reply Stop recording, transcribe, and send to LLM

Live mode flow:

  1. /live → recording starts
  2. Speak when ready
  3. /live reply → transcribes and sends
  4. AI responds → auto-speak reads it → next recording starts
  5. Repeat from step 2
  6. /live to stop

Configuration

Command Description
/voice-config Interactive settings: voice, TTS model, STT engine, API region

LLM Tools

When installed, the LLM can call these tools directly:

mimo_tts

Convert text to speech. Parameters:

  • text (required) — Text to speak
  • style (optional) — Style instruction (e.g., "excited", "calm", "东北话")

mimo_stt

Record and transcribe speech. Parameters:

  • duration (optional) — Recording duration in seconds (default: 10, max: 60)
  • auto_stop (optional) — Auto-stop when silence detected (default: false)

Voices

Preset Voices

Voice ID Language Gender
Default mimo_default auto auto
冰糖 冰糖 zh female
茉莉 茉莉 zh female
苏打 苏打 zh male
白桦 白桦 zh male
Mia Mia en female
Chloe Chloe en female
Milo Milo en male
Dean Dean en male

TTS Models

Model Description
mimo-v2.5-tts Preset voices (default)
mimo-v2.5-tts-voicedesign Custom voice via text description
mimo-v2.5-tts-voiceclone Clone voice from audio sample

API Configuration

The extension auto-detects the API region from your pi auth.json provider:

Provider auth.json Key Environment Variable Region
Xiaomi MiMo xiaomi XIAOMI_API_KEY Global
Token Plan CN xiaomi-token-plan-cn XIAOMI_TOKEN_PLAN_CN_API_KEY China
Token Plan AMS xiaomi-token-plan-ams XIAOMI_TOKEN_PLAN_AMS_API_KEY Amsterdam
Token Plan SGP xiaomi-token-plan-sgp XIAOMI_TOKEN_PLAN_SGP_API_KEY Singapore

Setup Options

Option 1: Environment Variable

export XIAOMI_TOKEN_PLAN_CN_API_KEY="your-key"

Option 2: pi /login

/login  → Select provider → Enter API key

Option 3: auth.json

{
  "xiaomi-token-plan-cn": { "type": "api_key", "key": "your-key" }
}

See pi providers docs for more details.

Requirements

  • Node.js ≥ 18
  • MiMo API key — configured via pi /login or environment variable (see API Configuration)
  • ffmpeg — for audio recording and playback (cross-platform)

Platform-Specific Audio Tools

Platform Recording Playback (priority order)
Linux parecord (PulseAudio) → ffmpeg paplayaplayffplaympv
macOS ffmpeg + avfoundation afplay (built-in) → ffplaympv
Windows ffmpeg + dshow ffplaympv

Linux: PulseAudio is recommended (parecord/paplay). ALSA (aplay) also works for playback.

macOS: No extra tools needed — afplay is built-in, ffmpeg handles recording via avfoundation.

Windows: Install ffmpeg and ensure it's in PATH. Default recording device is audio=麦克风, override with MIC_DEVICE env var.

⚠️ Cross-platform note: macOS and Windows support is based on ffmpeg's platform-specific audio backends (avfoundation/dshow) but has not been fully tested on real hardware. Linux is the primary tested platform. If you encounter issues on macOS or Windows, please open an issue — feedback and contributions are very welcome!

Environment Variables

Variable Description
MIC_DEVICE Override audio recording device (all platforms)
WHISPER_MODEL Path to whisper.cpp model file

npm Dependencies

The following dependencies are provided by pi automatically:

  • @earendil-works/pi-coding-agent — pi extension API
  • @earendil-works/pi-tui — pi TUI components
  • typebox — type validation

Optional

  • whisper.cpp — for local STT (faster, no API calls)
    • Set path: /voice-config → Whisper.cpp Path
    • Download model: whisper-cpp-download-ggml-model base

How It Works

TTS Flow

Text → MiMo TTS API → WAV audio → paplay/aplay/ffplay/mpv

STT Flow

Microphone → parecord/ffmpeg → WAV file → whisper.cpp or MiMo API → Text

Live Mode Flow

/live → Start recording
User speaks
/live reply → Stop recording → Transcribe → Send to LLM
LLM responds → Auto-speak reads response
Auto-start next recording

Feedback & Contributing

This project is actively maintained. If you have questions, bug reports, or feature requests:

Contributions of any kind are welcome!

License

MIT