@lalalic/local-vision-audio

Local vision (describe images), STT (speech-to-text), and TTS (text-to-speech). macOS uses MLX (Apple Silicon GPU). Windows/Linux use whisper, edge-tts, and transformers.

Packages

Package details

extensionskill

Install @lalalic/local-vision-audio from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:@lalalic/local-vision-audio
Package
@lalalic/local-vision-audio
Version
1.0.0
Published
Jun 17, 2026
Downloads
not available
Author
lalalic
License
MIT
Types
extension, skill
Size
28.5 KB
Dependencies
0 dependencies · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./local-vision-audio"
  ],
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

@lalalaic/local-vision-audio

Local vision, STT, and TTS tools for pi. Cross-platform: macOS (Apple Silicon MLX) and Windows/Linux (whisper, edge-tts, transformers).

pi install npm:@lalalaic/local-vision-audio

Tools

describe_image(path, prompt?, model?)

Platform Engine Details
macOS mlx-vlm GPU-accelerated, models up to 72B
Windows/Linux transformers + torch BLIP image captioning

transcribe_audio(path, model?, language?)

Platform Engine Details
macOS mlx-audio STT Whisper, VibeVoice, Qwen3-ASR
Windows/Linux openai-whisper Standard whisper models

text_to_speech(text, voice?, ref_audio?, instruct?, ...)

Platform Engine Details
macOS mlx-audio TTS Voice cloning + design, VoxCPM2/Kokoro
Windows/Linux edge-tts Microsoft Edge TTS engine

Self-healing errors

When a CLI or Python module is missing, the tool prints platform-specific install commands instead of a cryptic error:

mlx_vlm.generate not found.
Install mlx-vlm:
  uv tool install mlx-vlm
  Or: pip install mlx-vlm

Requirements

macOS (Apple Silicon)

uv tool install mlx-vlm     # Vision
uv tool install mlx-audio   # STT + TTS

Windows / Linux

pip install openai-whisper          # STT
pip install edge-tts                # TTS
pip install transformers torch torchvision Pillow  # Vision (CPU/GPU)

License

MIT