How-to guide

How to extract audio from video for captioning workflows

Accessibility captioning starts with a clean audio source that an ASR model or human captioner can work with. VideoSplit gives you a 48 kHz WAV file in seconds, which is the right input for tools like Whisper, Descript, Rev and Happy Scribe.

WebVTT and SRT caption generation depends on the quality of the source speech. VideoSplit gives you exactly what the video contains — no noise removal, no boosting. Any captioning errors come from the original recording quality, not the extraction step.

Step-by-step

Open VideoSplit.io. Any browser.
Drop your source video. MP4, MOV, MKV, WEBM — all supported.
Pick WAV. 48 kHz uncompressed is the right input for ASR. MP3 works but loses some fidelity at the low end.
Download the WAV. Feed it into Whisper, Descript or your captioning tool of choice.
Generate captions. The tool produces a WebVTT or SRT file you can pair back with your video.

Tips for better results

For WCAG-compliant captions, always proofread the ASR output — no model is perfect, especially on accented speech.
If you need per-speaker captions (e.g. interviews), Whisper with diarization or Descript's speaker-aware mode is the usual next step.
VideoSplit does not generate captions itself — it is the extraction step in a larger accessibility pipeline.

Free forever. No upload, no account.

Drop a video, get a WAV or MP3. Runs entirely in your browser — nothing uploads, nothing to install.

Try it free

How to extract audio from video for captioning workflows

Step-by-step

Tips for better results

Free forever. No upload, no account.

Related guides