pi-video-transcribe

Video transcription with speaker diarization for Pi. Transcribe videos with per-speaker labels, summaries, chapters, and sentiment analysis via AssemblyAI.

Package details

extensionskill

Install pi-video-transcribe from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-video-transcribe
Package
pi-video-transcribe
Version
1.0.0
Published
Apr 30, 2026
Downloads
not available
Author
ngsoftware
License
MIT
Types
extension, skill
Size
31.6 KB
Dependencies
1 dependency · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions"
  ],
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-video-transcribe

Video transcription with speaker diarization for pi. Transcribe any video or audio file and get per-speaker labeled text, summaries, and chapter breakdowns.

Powered by AssemblyAI (Universal-3 Pro model) with summaries and chapters via LLM Gateway.

Features

  • Speaker diarization — "who said what" with timestamps
  • Summary — bullet-point summary via LLM Gateway (Claude Sonnet)
  • Chapter breakdown — logical chapters with headlines and summaries via LLM Gateway
  • 99+ languages — auto-detected or specify manually
  • URLs & local files — HTTP(S) URLs or local file paths
  • 30+ video/audio formats — mp4, mkv, avi, mov, webm, mp3, wav, m4a, flac, ogg, and more
  • Skill included — guides pi to build custom AssemblyAI integrations (streaming, voice agents, etc.)

Prerequisites

1. AssemblyAI API Key

Step-by-step:

  1. Sign up at assemblyai.com/dashboard/signup
  2. Copy your API key from the dashboard (shown right after login under "Your API Key")
  3. Set it as environment variable:

Bash / ZSH (add to ~/.bashrc or ~/.zshrc):

export ASSEMBLYAI_API_KEY=your_api_key_here

PowerShell (run once, persists across restarts):

[Environment]::SetEnvironmentVariable("ASSEMBLYAI_API_KEY", "your_api_key_here", "User")

.env file (if you use dotenv in your project):

ASSEMBLYAI_API_KEY=your_api_key_here

Free tier: 5 hours of transcription per month — enough to get started. No credit card required.

2. ffmpeg (for local video files)

Required only when transcribing local video files (mp4, mkv, avi, etc.). Not needed for URLs or audio-only files (mp3, wav — AssemblyAI handles those directly).

If ffmpeg is missing, the tool will print platform-specific install instructions.

Platform Install
Windows winget install Gyan.FFmpeg
macOS brew install ffmpeg
Linux (Debian/Ubuntu) sudo apt install ffmpeg
Linux (Fedora) sudo dnf install ffmpeg
Linux (Arch) sudo pacman -S ffmpeg

Verify: ffmpeg -version

Installation

# From local path
pi install ./path/to/pi-video-transcribe

# From npm (once published)
pi install npm:pi-video-transcribe

# From git
pi install git:github.com/YOUR_USERNAME/pi-video-transcribe

After installation, the transcribe_video tool is available to the LLM automatically.

Usage

Just ask pi to transcribe a video:

> Transcribe ./meeting.mp4 and tell me what each person said

> What are the main topics discussed in this video? ./interview.mp4

> Summarize ./presentation.mp4 per speaker

> Transcribe https://storage.example.com/recording.mp3 with speaker labels

How it works

  1. pi recognizes the intent and calls the transcribe_video tool
  2. For local video files, audio is extracted via ffmpeg
  3. Audio is sent to AssemblyAI for transcription + diarization
  4. You get back a structured result with per-speaker transcript, summary, chapters

Parameters (LLM uses these automatically)

Parameter Default Description
source (required) Local file path or URL
speakers_expected auto Exact speaker count (only if certain)
speaker_min / speaker_max auto Speaker count range
include_summary true Generate bullet-point summary via LLM Gateway
include_chapters true Generate chapter breakdown via LLM Gateway
language_code auto BCP-47 code (de, en, fr, ...)

Example output

# Video Transcription

- **Duration:** 15:42
- **Language:** de

## Speakers

1. **Speaker A**
2. **Speaker B**

## Transcript

**Speaker A** [0:00 – 2:15]:
> Willkommen zu heutiger Diskussion über...

**Speaker B** [2:15 – 5:30]:
> Vielen Dank für die Einladung...

## Summary

• Die Diskussion behandelt die Einführung neuer Richtlinien
• Speaker A betont die Bedeutung von Transparenz
• Speaker B schlägt einen schrittweisen Ansatz vor
...

## Chapters

### Einleitung und Begrüßung [0:00 – 2:15]
Die Moderatorin eröffnet die Diskussion...

### Hauptthema: Neue Richtlinien [2:15 – 10:45]
Ausführliche Diskussion der vorgeschlagenen Änderungen...

Pricing

AssemblyAI charges per minute of audio. LLM Gateway usage is billed separately by token.

Feature Cost
Transcription (Universal-3 Pro) ~$0.0065/min
Speaker Diarization included
Summary via LLM Gateway token-based
Chapters via LLM Gateway token-based

A 20-minute video with transcription and diarization costs roughly $0.13. LLM Gateway adds a few cents for summary and chapters.

Troubleshooting

Issue Fix
"Set ASSEMBLYAI_API_KEY" warning Export the env variable and restart pi
ffmpeg not found Install ffmpeg (see table above) and restart your terminal
File not found Use absolute or relative-to-CWD paths
Transcription failed Check your API key and AssemblyAI dashboard quota
Summary/Chapters failed LLM Gateway error -- check AssemblyAI dashboard for LLM Gateway access
EU data residency Set ASSEMBLYAI_EU=true env var to use EU endpoints

License

MIT