How-to guide
How to extract audio from video for captioning workflows
Accessibility captioning starts with a clean audio source that an ASR model or human captioner can work with. VideoSplit gives you a 48 kHz WAV file in seconds, which is the right input for tools like Whisper, Descript, Rev and Happy Scribe.
WebVTT and SRT caption generation depends on the quality of the source speech. VideoSplit gives you exactly what the video contains — no noise removal, no boosting. Any captioning errors come from the original recording quality, not the extraction step.
Step-by-step
- Open VideoSplit.io. Any browser.
- Drop your source video. MP4, MOV, MKV, WEBM — all supported.
- Pick WAV. 48 kHz uncompressed is the right input for ASR. MP3 works but loses some fidelity at the low end.
- Download the WAV. Feed it into Whisper, Descript or your captioning tool of choice.
- Generate captions. The tool produces a WebVTT or SRT file you can pair back with your video.
Tips for better results
- For WCAG-compliant captions, always proofread the ASR output — no model is perfect, especially on accented speech.
- If you need per-speaker captions (e.g. interviews), Whisper with diarization or Descript's speaker-aware mode is the usual next step.
- VideoSplit does not generate captions itself — it is the extraction step in a larger accessibility pipeline.
Free forever. No upload, no account.
Drop a video, get a WAV or MP3. Runs entirely in your browser — nothing uploads, nothing to install.
Try it free