pi-video-transcribe
Video transcription with speaker diarization for Pi. Transcribe videos with per-speaker labels, summaries, chapters, and sentiment analysis via AssemblyAI.
Package details
Install pi-video-transcribe from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-video-transcribe- Package
pi-video-transcribe- Version
1.0.0- Published
- Apr 30, 2026
- Downloads
- not available
- Author
- ngsoftware
- License
- MIT
- Types
- extension, skill
- Size
- 31.6 KB
- Dependencies
- 1 dependency · 2 peers
Pi manifest JSON
{
"extensions": [
"./extensions"
],
"skills": [
"./skills"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-video-transcribe
Video transcription with speaker diarization for pi. Transcribe any video or audio file and get per-speaker labeled text, summaries, and chapter breakdowns.
Powered by AssemblyAI (Universal-3 Pro model) with summaries and chapters via LLM Gateway.
Features
- Speaker diarization — "who said what" with timestamps
- Summary — bullet-point summary via LLM Gateway (Claude Sonnet)
- Chapter breakdown — logical chapters with headlines and summaries via LLM Gateway
- 99+ languages — auto-detected or specify manually
- URLs & local files — HTTP(S) URLs or local file paths
- 30+ video/audio formats — mp4, mkv, avi, mov, webm, mp3, wav, m4a, flac, ogg, and more
- Skill included — guides pi to build custom AssemblyAI integrations (streaming, voice agents, etc.)
Prerequisites
1. AssemblyAI API Key
Step-by-step:
- Sign up at assemblyai.com/dashboard/signup
- Copy your API key from the dashboard (shown right after login under "Your API Key")
- Set it as environment variable:
Bash / ZSH (add to ~/.bashrc or ~/.zshrc):
export ASSEMBLYAI_API_KEY=your_api_key_here
PowerShell (run once, persists across restarts):
[Environment]::SetEnvironmentVariable("ASSEMBLYAI_API_KEY", "your_api_key_here", "User")
.env file (if you use dotenv in your project):
ASSEMBLYAI_API_KEY=your_api_key_here
Free tier: 5 hours of transcription per month — enough to get started. No credit card required.
2. ffmpeg (for local video files)
Required only when transcribing local video files (mp4, mkv, avi, etc.). Not needed for URLs or audio-only files (mp3, wav — AssemblyAI handles those directly).
If ffmpeg is missing, the tool will print platform-specific install instructions.
| Platform | Install |
|---|---|
| Windows | winget install Gyan.FFmpeg |
| macOS | brew install ffmpeg |
| Linux (Debian/Ubuntu) | sudo apt install ffmpeg |
| Linux (Fedora) | sudo dnf install ffmpeg |
| Linux (Arch) | sudo pacman -S ffmpeg |
Verify: ffmpeg -version
Installation
# From local path
pi install ./path/to/pi-video-transcribe
# From npm (once published)
pi install npm:pi-video-transcribe
# From git
pi install git:github.com/YOUR_USERNAME/pi-video-transcribe
After installation, the transcribe_video tool is available to the LLM automatically.
Usage
Just ask pi to transcribe a video:
> Transcribe ./meeting.mp4 and tell me what each person said
> What are the main topics discussed in this video? ./interview.mp4
> Summarize ./presentation.mp4 per speaker
> Transcribe https://storage.example.com/recording.mp3 with speaker labels
How it works
- pi recognizes the intent and calls the
transcribe_videotool - For local video files, audio is extracted via ffmpeg
- Audio is sent to AssemblyAI for transcription + diarization
- You get back a structured result with per-speaker transcript, summary, chapters
Parameters (LLM uses these automatically)
| Parameter | Default | Description |
|---|---|---|
source |
(required) | Local file path or URL |
speakers_expected |
auto | Exact speaker count (only if certain) |
speaker_min / speaker_max |
auto | Speaker count range |
include_summary |
true |
Generate bullet-point summary via LLM Gateway |
include_chapters |
true |
Generate chapter breakdown via LLM Gateway |
language_code |
auto | BCP-47 code (de, en, fr, ...) |
Example output
# Video Transcription
- **Duration:** 15:42
- **Language:** de
## Speakers
1. **Speaker A**
2. **Speaker B**
## Transcript
**Speaker A** [0:00 – 2:15]:
> Willkommen zu heutiger Diskussion über...
**Speaker B** [2:15 – 5:30]:
> Vielen Dank für die Einladung...
## Summary
• Die Diskussion behandelt die Einführung neuer Richtlinien
• Speaker A betont die Bedeutung von Transparenz
• Speaker B schlägt einen schrittweisen Ansatz vor
...
## Chapters
### Einleitung und Begrüßung [0:00 – 2:15]
Die Moderatorin eröffnet die Diskussion...
### Hauptthema: Neue Richtlinien [2:15 – 10:45]
Ausführliche Diskussion der vorgeschlagenen Änderungen...
Pricing
AssemblyAI charges per minute of audio. LLM Gateway usage is billed separately by token.
| Feature | Cost |
|---|---|
| Transcription (Universal-3 Pro) | ~$0.0065/min |
| Speaker Diarization | included |
| Summary via LLM Gateway | token-based |
| Chapters via LLM Gateway | token-based |
A 20-minute video with transcription and diarization costs roughly $0.13. LLM Gateway adds a few cents for summary and chapters.
Troubleshooting
| Issue | Fix |
|---|---|
| "Set ASSEMBLYAI_API_KEY" warning | Export the env variable and restart pi |
| ffmpeg not found | Install ffmpeg (see table above) and restart your terminal |
| File not found | Use absolute or relative-to-CWD paths |
| Transcription failed | Check your API key and AssemblyAI dashboard quota |
| Summary/Chapters failed | LLM Gateway error -- check AssemblyAI dashboard for LLM Gateway access |
| EU data residency | Set ASSEMBLYAI_EU=true env var to use EU endpoints |
License
MIT