How-to guide

How to extract audio for transcription

Transcription tools — Whisper, Otter, Rev, Descript, AssemblyAI — all prefer a clean audio source over a video file. VideoSplit gives you a 48 kHz WAV in one click, which is the sample rate Whisper was trained on and the format every ASR pipeline handles without a secondary conversion step.

Whisper's tiny/base models downsample internally to 16 kHz, but giving them 48 kHz WAV input never hurts — it is always better to downsample from a clean source than to upsample from a lossy one.

Step-by-step

Open VideoSplit.io. Any browser.
Drop the source video. MP4, MOV, MKV, WEBM — whatever you have. VideoSplit decodes it locally.
Pick WAV. WAV is what you want for transcription. MP3 works too, but you lose a small amount of fidelity at the low end that ASR models use for speaker segmentation.
Download the WAV. Saves at 48 kHz PCM.
Feed it into your ASR tool of choice. whisper audio.wav, or upload to Otter/Rev/Descript. No further preprocessing needed.

Tips for better results

For multi-hour recordings, chunk the WAV into 10-minute segments first — most ASR services have a per-request duration cap.
Whisper's 'large-v3' model is very robust on 48 kHz WAV and can handle most phone-quality inputs without issue.
For podcast transcription, a clean 48 kHz WAV from VideoSplit beats a downloaded MP3 every time.

Free forever. No upload, no account.

Drop a video, get a WAV or MP3. Runs entirely in your browser — nothing uploads, nothing to install.

Try it free

How to extract audio for transcription

Step-by-step

Tips for better results

Free forever. No upload, no account.

Related guides