WebVTT caption and subtitle tracks on every HTML video
Publish time-aligned plain-text captions any crawler can fetch alongside your video.
What this signal tests
We check that every <video> element on your page has at least one <track> child with kind set to captions or subtitles. The track's src must return a text/vtt file that starts with the literal string WEBVTT. For <audio> elements, we accept either a track or a linked HTML transcript on the page.
Why it matters for your visibility in AI
WebVTT captions are the cleanest way to turn video into AI-readable text. They are a plain text file (just sentences with timestamps) that any HTTP client can fetch in a single request. No video decoding, no speech-to-text, no JavaScript. An AI crawler can grab the .vtt file and get the entire spoken content of your video as searchable text. Without captions, your video might as well be a black box. Multimodal AI models technically can transcribe video, but the cost at crawl scale means they almost never do. The practical result is that your tutorial, your interview, your demo never feeds into AI answers. A competitor who ships a .vtt file alongside their video gets cited; you do not. Captions are also a legal accessibility requirement in many jurisdictions, so this is doubly worth doing.
Pass criteria at a glance
| Criterion | Passes when |
|---|---|
| >=90% of <video> have valid WebVTT track; <audio> has transcript link/track. |
How we test it
Our scanner finds each <video> element and walks its <track> children. For each track we check that kind is captions, subtitles, or descriptions, then fetch the src URL. We verify the response is 200, has Content-Type text/vtt (or text/plain), and that the body starts with the WEBVTT identifier line. For <audio> elements we look for a transcript link or track instead.
Show technical detection method
For each <video>, enumerate <track> children; require >=1 with kind in {captions, subtitles, descriptions} AND src returns 200 with text/vtt AND content begins WEBVTT. For <audio>, require transcript link or track.
If your site fails: how to fix it
- For every <video> element, add <track kind="captions" srclang="en" src="captions.vtt" default> as a child. Adjust srclang and add additional tracks for other languages.
- Generate VTT files using Whisper (free, command-line: `whisper video.mp4 --output_format vtt`), Descript, Rev, or your video host's caption editor. Aim for human-edited accuracy on proper nouns.
- Serve .vtt files with Content-Type text/vtt. Most CDNs need explicit MIME-type configuration; otherwise they default to application/octet-stream, which some clients reject.
- For audio elements, publish an HTML transcript on the same page (linked or visible) since browsers do not natively render <track> on <audio> in the same way. Wrap the transcript in <article> for crawlability.
- Add a kind=descriptions track for visual-only content (gestures, on-screen text not spoken aloud) to give the fullest possible textual representation.
Quick facts
| Maturity | ESTABLISHED |
|---|---|
| Weight | high |
| Category | Multimodal |
Primary sources
Related signals
Frequently asked questions
What if my video is on YouTube? Do I still need WebVTT tracks?
Not on the YouTube-hosted version (YouTube provides its own caption system). But if you also embed the video on your own page and self-host, yes: add a <track> with the VTT to that embed. AI crawlers reading your HTML do not automatically follow YouTube's caption API.
Are auto-generated captions good enough?
Functional but not great. They miss proper nouns, technical terms, and brand names. Since those are exactly the words AI systems use for citation matching, a 30-minute review pass to clean them up has outsized impact on how accurately your content gets cited.
Should I include both captions and subtitles tracks?
Captions cover spoken dialogue plus non-dialogue audio cues (laughter, music descriptions) for viewers who cannot hear. Subtitles translate dialogue for viewers who speak another language. Include both if your audience needs both; the AI crawler benefit is similar.
Does the default attribute matter?
It tells browsers which track to display by default. For AI crawlability it does not change anything (the crawler reads all tracks regardless), but for human accessibility it matters: set default on the primary caption track.
Run your own scan
Run a free scan and see how your site grades across all 155 AI-readiness signals.