@lalalic/local-vision-audio
Local vision (describe images), STT (speech-to-text), and TTS (text-to-speech). macOS uses MLX (Apple Silicon GPU). Windows/Linux use whisper, edge-tts, and transformers.
Package details
Install @lalalic/local-vision-audio from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@lalalic/local-vision-audio- Package
@lalalic/local-vision-audio- Version
1.0.0- Published
- Jun 17, 2026
- Downloads
- not available
- Author
- lalalic
- License
- MIT
- Types
- extension, skill
- Size
- 28.5 KB
- Dependencies
- 0 dependencies · 2 peers
Pi manifest JSON
{
"extensions": [
"./local-vision-audio"
],
"skills": [
"./skills"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
@lalalaic/local-vision-audio
Local vision, STT, and TTS tools for pi. Cross-platform: macOS (Apple Silicon MLX) and Windows/Linux (whisper, edge-tts, transformers).
pi install npm:@lalalaic/local-vision-audio
Tools
describe_image(path, prompt?, model?)
| Platform | Engine | Details |
|---|---|---|
| macOS | mlx-vlm | GPU-accelerated, models up to 72B |
| Windows/Linux | transformers + torch | BLIP image captioning |
transcribe_audio(path, model?, language?)
| Platform | Engine | Details |
|---|---|---|
| macOS | mlx-audio STT | Whisper, VibeVoice, Qwen3-ASR |
| Windows/Linux | openai-whisper | Standard whisper models |
text_to_speech(text, voice?, ref_audio?, instruct?, ...)
| Platform | Engine | Details |
|---|---|---|
| macOS | mlx-audio TTS | Voice cloning + design, VoxCPM2/Kokoro |
| Windows/Linux | edge-tts | Microsoft Edge TTS engine |
Self-healing errors
When a CLI or Python module is missing, the tool prints platform-specific install commands instead of a cryptic error:
mlx_vlm.generate not found.
Install mlx-vlm:
uv tool install mlx-vlm
Or: pip install mlx-vlm
Requirements
macOS (Apple Silicon)
uv tool install mlx-vlm # Vision
uv tool install mlx-audio # STT + TTS
Windows / Linux
pip install openai-whisper # STT
pip install edge-tts # TTS
pip install transformers torch torchvision Pillow # Vision (CPU/GPU)
License
MIT