@wenjinnn/pi-mimo-voice
Voice input and output for pi powered by Xiaomi MiMo V2.5 TTS
Package details
Install @wenjinnn/pi-mimo-voice from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@wenjinnn/pi-mimo-voice- Package
@wenjinnn/pi-mimo-voice- Version
1.1.1- Published
- May 31, 2026
- Downloads
- not available
- Author
- wenjinnn
- License
- MIT
- Types
- extension
- Size
- 47 KB
- Dependencies
- 0 dependencies · 3 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
],
"image": "https://github.com/wenjinnn/pi-mimo-voice/releases/download/v1.0.5/screenshot.png",
"video": "https://github.com/wenjinnn/pi-mimo-voice/releases/download/v1.0.5/video.mp4"
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-mimo-voice
🇨🇳 中文文档
Voice input (STT) and output (TTS) for pi powered by Xiaomi MiMo V2.5 API.

🔊 Please turn on sound for the demo below
https://github.com/user-attachments/assets/9a9f9984-9476-4c4e-bbeb-ae088a7d875c
Features
- 🎤 Speech-to-Text (STT) — Record audio from microphone and transcribe
- 🔊 Text-to-Speech (TTS) — Speak text aloud through speakers
- 🗣️ Auto-speak — Automatically read all assistant replies
- 🎙️ Live mode — Continuous voice conversation loop
- 🎛️ Interactive config — Voice, model, engine, and region settings
Installation
# Install from npm
pi install npm:@wenjinnn/pi-mimo-voice
# Or install from GitHub
pi install git:github.com/wenjinnn/pi-mimo-voice
# Or clone manually
cd ~/.pi/agent/extensions
git clone https://github.com/wenjinnn/pi-mimo-voice.git
Quick Start
# 1. Set API key (via environment variable or pi /login)
export XIAOMI_TOKEN_PLAN_CN_API_KEY="your-key" # China region
# Or: export XIAOMI_API_KEY="your-key" # Global
# 2. Restart pi or /reload
# 3. Try it
/speak Hello from MiMo!
/listen
/listen stop
Commands
Text-to-Speech
| Command | Description |
|---|---|
/speak <text> |
Speak text aloud using MiMo TTS |
Speech-to-Text
| Command | Description |
|---|---|
/listen |
Start recording (manual stop with /listen stop) |
/listen stop |
Stop recording and transcribe |
/listen auto-stop N |
Record for N seconds then transcribe |
/listen N |
Record for N seconds (max 60) then transcribe |
Auto-speak & Live Mode
| Command | Description |
|---|---|
/auto-speak |
Toggle auto-speak (read all assistant replies). Use /auto-speak on|off for explicit control |
/live |
Start/stop live voice mode (auto-speak ON, begins recording) |
/live reply |
Stop recording, transcribe, and send to LLM |
Live mode flow:
/live→ recording starts- Speak when ready
/live reply→ transcribes and sends- AI responds → auto-speak reads it → next recording starts
- Repeat from step 2
/liveto stop
Configuration
| Command | Description |
|---|---|
/voice-config |
Interactive settings: voice, TTS model, STT engine, API region |
LLM Tools
When installed, the LLM can call these tools directly:
mimo_tts
Convert text to speech. Parameters:
text(required) — Text to speakstyle(optional) — Style instruction (e.g., "excited", "calm", "东北话")
mimo_stt
Record and transcribe speech. Parameters:
duration(optional) — Recording duration in seconds (default: 10, max: 60)auto_stop(optional) — Auto-stop when silence detected (default: false)
Voices
Preset Voices
| Voice | ID | Language | Gender |
|---|---|---|---|
| Default | mimo_default |
auto | auto |
| 冰糖 | 冰糖 |
zh | female |
| 茉莉 | 茉莉 |
zh | female |
| 苏打 | 苏打 |
zh | male |
| 白桦 | 白桦 |
zh | male |
| Mia | Mia |
en | female |
| Chloe | Chloe |
en | female |
| Milo | Milo |
en | male |
| Dean | Dean |
en | male |
TTS Models
| Model | Description |
|---|---|
mimo-v2.5-tts |
Preset voices (default) |
mimo-v2.5-tts-voicedesign |
Custom voice via text description |
mimo-v2.5-tts-voiceclone |
Clone voice from audio sample |
API Configuration
The extension auto-detects the API region from your pi auth.json provider:
| Provider | auth.json Key | Environment Variable | Region |
|---|---|---|---|
| Xiaomi MiMo | xiaomi |
XIAOMI_API_KEY |
Global |
| Token Plan CN | xiaomi-token-plan-cn |
XIAOMI_TOKEN_PLAN_CN_API_KEY |
China |
| Token Plan AMS | xiaomi-token-plan-ams |
XIAOMI_TOKEN_PLAN_AMS_API_KEY |
Amsterdam |
| Token Plan SGP | xiaomi-token-plan-sgp |
XIAOMI_TOKEN_PLAN_SGP_API_KEY |
Singapore |
Setup Options
Option 1: Environment Variable
export XIAOMI_TOKEN_PLAN_CN_API_KEY="your-key"
Option 2: pi /login
/login → Select provider → Enter API key
Option 3: auth.json
{
"xiaomi-token-plan-cn": { "type": "api_key", "key": "your-key" }
}
See pi providers docs for more details.
Requirements
- Node.js ≥ 18
- MiMo API key — configured via
pi /loginor environment variable (see API Configuration) - ffmpeg — for audio recording and playback (cross-platform)
Platform-Specific Audio Tools
| Platform | Recording | Playback (priority order) |
|---|---|---|
| Linux | parecord (PulseAudio) → ffmpeg |
paplay → aplay → ffplay → mpv |
| macOS | ffmpeg + avfoundation | afplay (built-in) → ffplay → mpv |
| Windows | ffmpeg + dshow | ffplay → mpv |
Linux: PulseAudio is recommended (parecord/paplay). ALSA (aplay) also works for playback.
macOS: No extra tools needed — afplay is built-in, ffmpeg handles recording via avfoundation.
Windows: Install ffmpeg and ensure it's in PATH. Default recording device is audio=麦克风, override with MIC_DEVICE env var.
⚠️ Cross-platform note: macOS and Windows support is based on ffmpeg's platform-specific audio backends (avfoundation/dshow) but has not been fully tested on real hardware. Linux is the primary tested platform. If you encounter issues on macOS or Windows, please open an issue — feedback and contributions are very welcome!
Environment Variables
| Variable | Description |
|---|---|
MIC_DEVICE |
Override audio recording device (all platforms) |
WHISPER_MODEL |
Path to whisper.cpp model file |
npm Dependencies
The following dependencies are provided by pi automatically:
@earendil-works/pi-coding-agent— pi extension API@earendil-works/pi-tui— pi TUI componentstypebox— type validation
Optional
- whisper.cpp — for local STT (faster, no API calls)
- Set path:
/voice-config→ Whisper.cpp Path - Download model:
whisper-cpp-download-ggml-model base
- Set path:
How It Works
TTS Flow
Text → MiMo TTS API → WAV audio → paplay/aplay/ffplay/mpv
STT Flow
Microphone → parecord/ffmpeg → WAV file → whisper.cpp or MiMo API → Text
Live Mode Flow
/live → Start recording
User speaks
/live reply → Stop recording → Transcribe → Send to LLM
LLM responds → Auto-speak reads response
Auto-start next recording
Feedback & Contributing
This project is actively maintained. If you have questions, bug reports, or feature requests:
- 🐛 Open an issue
- 🔀 Submit a pull request
- 💬 Share your experience and suggestions
Contributions of any kind are welcome!
License
MIT