pi-video-transcribe

Video transcription with speaker diarization for Pi. Transcribe videos with per-speaker labels, summaries, chapters, and sentiment analysis via AssemblyAI.

Package details

← Back

extensionskill

Install pi-video-transcribe from npm and Pi will load the resources declared by the package manifest.

npm report

$ pi install npm:pi-video-transcribe

Package: pi-video-transcribe
Version: 1.0.0
Published: Apr 30, 2026
Downloads: not available
Author: ngsoftware
License: MIT
Types: extension, skill
Size: 31.6 KB
Dependencies: 1 dependency · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions"
  ],
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-video-transcribe

Video transcription with speaker diarization for pi. Transcribe any video or audio file and get per-speaker labeled text, summaries, and chapter breakdowns.

Features

Speaker diarization — "who said what" with timestamps
Summary — bullet-point summary via LLM Gateway (Claude Sonnet)
Chapter breakdown — logical chapters with headlines and summaries via LLM Gateway
99+ languages — auto-detected or specify manually
URLs & local files — HTTP(S) URLs or local file paths
30+ video/audio formats — mp4, mkv, avi, mov, webm, mp3, wav, m4a, flac, ogg, and more
Skill included — guides pi to build custom AssemblyAI integrations (streaming, voice agents, etc.)

Prerequisites

1. AssemblyAI API Key

Step-by-step:

Sign up at assemblyai.com/dashboard/signup
Copy your API key from the dashboard (shown right after login under "Your API Key")
Set it as environment variable:

Bash / ZSH (add to ~/.bashrc or ~/.zshrc):

export ASSEMBLYAI_API_KEY=your_api_key_here

PowerShell (run once, persists across restarts):

[Environment]::SetEnvironmentVariable("ASSEMBLYAI_API_KEY", "your_api_key_here", "User")

.env file (if you use dotenv in your project):

ASSEMBLYAI_API_KEY=your_api_key_here

Free tier: 5 hours of transcription per month — enough to get started. No credit card required.

2. ffmpeg (for local video files)

Required only when transcribing local video files (mp4, mkv, avi, etc.). Not needed for URLs or audio-only files (mp3, wav — AssemblyAI handles those directly).

If ffmpeg is missing, the tool will print platform-specific install instructions.

Platform	Install
Windows	`winget install Gyan.FFmpeg`
macOS	`brew install ffmpeg`
Linux (Debian/Ubuntu)	`sudo apt install ffmpeg`
Linux (Fedora)	`sudo dnf install ffmpeg`
Linux (Arch)	`sudo pacman -S ffmpeg`

Verify: ffmpeg -version

Installation

# From local path
pi install ./path/to/pi-video-transcribe

# From npm (once published)
pi install npm:pi-video-transcribe

# From git
pi install git:github.com/YOUR_USERNAME/pi-video-transcribe

After installation, the transcribe_video tool is available to the LLM automatically.

Usage

Just ask pi to transcribe a video:

> Transcribe ./meeting.mp4 and tell me what each person said

> What are the main topics discussed in this video? ./interview.mp4

> Summarize ./presentation.mp4 per speaker

> Transcribe https://storage.example.com/recording.mp3 with speaker labels

How it works

pi recognizes the intent and calls the transcribe_video tool
For local video files, audio is extracted via ffmpeg
Audio is sent to AssemblyAI for transcription + diarization
You get back a structured result with per-speaker transcript, summary, chapters

Parameters (LLM uses these automatically)

Parameter	Default	Description
`source`	(required)	Local file path or URL
`speakers_expected`	auto	Exact speaker count (only if certain)
`speaker_min` / `speaker_max`	auto	Speaker count range
`include_summary`	`true`	Generate bullet-point summary via LLM Gateway
`include_chapters`	`true`	Generate chapter breakdown via LLM Gateway
`language_code`	auto	BCP-47 code (`de`, `en`, `fr`, ...)

Example output

# Video Transcription

- **Duration:** 15:42
- **Language:** de

## Speakers

1. **Speaker A**
2. **Speaker B**

## Transcript

**Speaker A** [0:00 – 2:15]:
> Willkommen zu heutiger Diskussion über...

**Speaker B** [2:15 – 5:30]:
> Vielen Dank für die Einladung...

## Summary

• Die Diskussion behandelt die Einführung neuer Richtlinien
• Speaker A betont die Bedeutung von Transparenz
• Speaker B schlägt einen schrittweisen Ansatz vor
...

## Chapters

### Einleitung und Begrüßung [0:00 – 2:15]
Die Moderatorin eröffnet die Diskussion...

### Hauptthema: Neue Richtlinien [2:15 – 10:45]
Ausführliche Diskussion der vorgeschlagenen Änderungen...

Pricing

AssemblyAI charges per minute of audio. LLM Gateway usage is billed separately by token.

Feature	Cost
Transcription (Universal-3 Pro)	~$0.0065/min
Speaker Diarization	included
Summary via LLM Gateway	token-based
Chapters via LLM Gateway	token-based

A 20-minute video with transcription and diarization costs roughly $0.13. LLM Gateway adds a few cents for summary and chapters.

Troubleshooting

Issue	Fix
"Set ASSEMBLYAI_API_KEY" warning	Export the env variable and restart pi
ffmpeg not found	Install ffmpeg (see table above) and restart your terminal
File not found	Use absolute or relative-to-CWD paths
Transcription failed	Check your API key and AssemblyAI dashboard quota
Summary/Chapters failed	LLM Gateway error -- check AssemblyAI dashboard for LLM Gateway access
EU data residency	Set `ASSEMBLYAI_EU=true` env var to use EU endpoints

License

MIT