How-to guide
How to extract audio for transcription
Transcription tools — Whisper, Otter, Rev, Descript, AssemblyAI — all prefer a clean audio source over a video file. VideoSplit gives you a 48 kHz WAV in one click, which is the sample rate Whisper was trained on and the format every ASR pipeline handles without a secondary conversion step.
Whisper's tiny/base models downsample internally to 16 kHz, but giving them 48 kHz WAV input never hurts — it is always better to downsample from a clean source than to upsample from a lossy one.
Step-by-step
- Open VideoSplit.io. Any browser.
- Drop the source video. MP4, MOV, MKV, WEBM — whatever you have. VideoSplit decodes it locally.
- Pick WAV. WAV is what you want for transcription. MP3 works too, but you lose a small amount of fidelity at the low end that ASR models use for speaker segmentation.
- Download the WAV. Saves at 48 kHz PCM.
- Feed it into your ASR tool of choice. whisper audio.wav, or upload to Otter/Rev/Descript. No further preprocessing needed.
Tips for better results
- For multi-hour recordings, chunk the WAV into 10-minute segments first — most ASR services have a per-request duration cap.
- Whisper's 'large-v3' model is very robust on 48 kHz WAV and can handle most phone-quality inputs without issue.
- For podcast transcription, a clean 48 kHz WAV from VideoSplit beats a downloaded MP3 every time.
Free forever. No upload, no account.
Drop a video, get a WAV or MP3. Runs entirely in your browser — nothing uploads, nothing to install.
Try it free