Notes from WebVTT

Players should email the text track community about their WebVTT implementations

CSS extension in WebVTT

  • Smooth rollup is biggest hurdles to webvtt and ttml. That's why googs uses their own format internally.
  • WebVTT conversion. We need to convert them into other formats.
  • CSS files make parsing equivalent WebVTT / TTML files very difficult
  • How does WebVTT handle live?
  • Some platforms _require_ WebVTT. We can't always assume what support platforms have. Sometimes not sure if new/old iOS, built into TV etc.
  • Lowest common denominator. How fast can we bring platforms up to spec, and what do we do with the old ones?

Starting with CSS

CSS issues

 - Selectors
 - External files
 - Embedded
 - Colors
 - iOS

There are 2 implementations of webvtt at apple, one in webkit and one in coremedia. They're separate. WebKit? can obvs leverage the browser infrastructure, but coremedia must re-implement from scratch (no browser)


Vertical captions

Smooth scrolling


  • Gut feeling that when webvtt reaches PR level at w3c Microsoft will get interested again.
  • Microsoft like everyone else - they implement what they need to. Captions/Subtitles not a strategic hill to die on, just a requirement.
  • Once VTT solves a whole load of business problems. But in browser polyfills and conversion tools remove pressure on browser vendors.
  • Browsers have made Javascript so powerful we no longer need platform support for WebVTT. legitimate question is why browsers need to support a feature if it can be implemented as a polyfill.
  • Firefox view - especially CSS extensions are difficult, need to work with rest of team to make it happen. * Input required to prioritise webvtt css features.
  • extra polyfill to load delays start of video & puts extra burden on rendering, so native implementation useful.

Is vtt.js abandoned?

  • Firefox has suspended vtt.js for now due to lack of certainty of the future of VTT. If more people want to use vtt.js then they will move forward.
  • Folks definitely using it, but splitting rendering of webvtt and parsing, to leverage the rest of the browser to make polyfills lighter
  • JWPlayer? will definitely help and upstream vtt.js project if its going to be supported.
  • Should vtt.js become the de-facto polyfill?
  • Argument against polyfill: In a list of playlist items what happens when you may or may not need it.

Polyfill increases player payload. Support for vertical text and further features increases code size.

  • vtt.js being maintained also brings firefox to the party. community support reduces need to get all browsers on board.
  • Chrome could pick up vtt.js and use it them same way.
  • iOS and WebKit? implementation is closest to the spec right now. Big effort 2-3 years ago to get to feature parity on WebVTT including regions.
  • vtt.js and WebKit? interoperability will bring us to a really good state.
  • bcov has contributed to vtt.js but is now maintaining a fork. Lets bring the fork back to the table. Mainly packaging and build tooling changes.
  • Firefox - we like vtt.js please don't abandon it.

Interop with ttml/imsc1.

  • w3c working group has effort to convert ttml to webvtt and back.
  • Initial spec exists. If you're writing tools, read the spec and contribute back.
  • Please make suggestions
  • Good resources on mapping:
  • (this spec is currently looking for help!)

Semi-abandoned specs that need love:

  • Spec that describes how to achieve 608 to VTT conversion

  • This spec needs an update with a current vtt spec. It also requires metadata issue to be resolved. Needs to be updated with version2
  • Including all the information about a 608/708 track has
  • WebVTT doesn't self-describe. You can put it into a note but there's not explicit spec at this time. Implementation-defined.
  • There's an obvious tension between self-describing files and that the environment has to describe.
  • WHATWG wants non self-describing files because the environment needs to describe them
  • However in the larger community it makes sense. language impacts rendering.
  • Language is really important (and missing) from WebVTT. {NOT TRUE!}
  • Arabic fonts have different fonts based on the dialect. That's not on a file basis but a span basic. This is difficult and language alone isn't sufficient to drive rendering of spans
  • webvtt has capability to put language on spans. No default language specified because that's done on a track basis in html5 and hence doesn't need duplication
  • worries about conflicts between webvtt file and environment.
  • language within file is used to drive font rendering. (but not just, used for other things).
  • Tracks need language to do track selection.
  • Caption files need language to do font rendering and search.
  • Granularity yes. what there isn't is a top level.
  • Lauretta agrees that we need language.
  • This is a v2 thing and we will make it happen there.

For players that need sidecars or subtitles in manifests. how do we load these files?

  • do we set source or parse the file
  • CORS issues with just setting the source due to other domains on the media element. Own set of problems.
  • That effects setting the language and how its rendered.
  • You can set language at the cue level and at the span level.
  • Is everyone just parsing the files or setting the source on a track element.
  • Are you using the text track api or implementing it manually
  • bcov: both, vtt.js parsed manually and hls extension for timestamp offset. can't just set the vtt track, only in special cases (like single vtt for entire thing). Could be an option.
  • people are currently avoiding the text track api due to compatibility issues. retrieving and segmenting outside of track elements is de facto standard. Or create an empty text track and programmatically add cues.

Don't forget to register bugs.

  • FCC interoperability and 608/708 goes hand-in-hand. That spec is where to have the discussion.
  • Details on smooth scrolling that need addressing
  • over time smooth scrolling has bubbled up to the top of complaints, and isn't implemented. Its really annoying.
  • Talking about smooth scrolling
  • specification problem or implementation problem?
  • Implementation. Rollup captions. They are being used in live.
  • WebVTT has defined a region element inside webvtt for this. we did this to define an area where a cue can be rendered and can be automatically controlled by the browser. Okay the top line is gone, move the others up. The spec is written so that smooth scrolling is su
  • one cue (not duplicate) and positioning done by region.
  • apple has implemented webvtt regions. that implements smooth scrolling working. need to check. pretty confident that apple does smooth scrolling. Spec has it, nobody has implemented
  • asking ken: as someone who doesn't have issues hearing, what is the specific use case will rollup and smooth scrolling. is it about knowing what you've seen
  • answer: I've answered this question a few times, but most often you see it during live or near live. a stenocaptionist is creating captions as fast as possible. each new line causes the line to scroll up. You can do it 2 ways, "snap/bang" simple way to move fast. but its really annoying because every time the captions look like they're being completely replaced and your brain is trained to read it from the beginning. Its just very confusing. with smooth scrolling you can track it very easy what's happening - its the same thing, its not a new line that's appearing and if you want to play the video at a faster speed to can follow much better. Your brain gets confused when things are popping around.

Are there demos of smooth scrolling? pinging colleagues to find an example.

  • challenge with smooth scrolling now is that there's not command or property to do that specifically. So we find workarounds. append some text, remove a cue. jarring.
  • If you don't use regions then only regions are created to create a grouping of cues that as you add, you can specify how the cues should smoothly rollup.
  • So actually - you just gave me an idea. I know that the original spec asks that lines be replaced in that way. * If some of those lines were to stay there why would they jump. Maybe rewrite spec to specify a smooth transition.
  • Challenge is that you can't update a live cue, but you can replace a cue and add a new cue.
  • Updateable cues might be on solution.
  • Solving for live captioning might solve updates.
  • its primarily an implementation issue. an implementation . regions were specifically made to have scrolling. cues would need to be multiplied and realize about text repetition.
  • Regions are a grouping of cues that removes need to duplicate and implies that you scroll smoothly.

I believe you're right silvia - there are situations where people repeat what they said, and it needs to pop up again to realize that they said it twice.

  • Does the region spec say anything about speed of scrolling?
  • Yes. It does. It has a description to specify algorithmically how to do it.
  • Is there a default?
  • Yes, taken from 608/708
  • It was carefully designed to match 608/708.
  • regions are the answer to how to map 608/708. If you use it, you can achieve functional equvalence. Its based on 708 but with different names that will be easier to understand.
  • WebVTT has generalized a little bit since we have the ability to position anywhere and have an anchor point. Little bit more generic. Used FCC 608/708 features because they're what captioners actually use. But replicated in a way that make sense for the web.
  • Ken: I can't help but add a little, but the reason for the anchor points is that when 708 was being designed, realized a use for customizable font size. Regions address the need for users to customize their font experience.
  • WebVTT regions can handle new line breaks, increase the cue size, its dynamic not "this is the line and it has to stay here".
  • Rollup question: If the cues have to roll up, that would add latency to the next cue showing up. Does the spec define timing of cues when the next cue is only 0.1 seconds. What does the spec say? It will get displayed but it might immediately disappear. Not serious caption author would do that but its covered in the spec.
  • What is the status of vtt.js regions.
  • jwplayer stripped it out for lowest common denominator. Because regions are only supported in safari,
  • Don't use regions. Firefox doesn't do regions very well. vtt.js status unknown.
  • vtt.js has a bug with region support.
  • jwplayer can definitely file bugs and send some fixes.
  • How does vtt work across segments. How does the second one know you're adding to a previous region.
  • There's a lot of player behavior around what's already in the region.
  • Scrolling depends on what's already there.
  • The player needs to look at what's already in the region.
  • Send me region as a question. You only have to put in what you need. It makes sense to be consistent. In the original full vtt
  • Put all region definitions into every segment to.
  • I can add to discussion. We (google) are working on chunks (segments) the keyword here is status. Its not region-dependent what's happening to others. I think pierre was talking about something similar. We can all be on the same page.
  • Hopefully segments are not so small it causes us to go crazy.
  • One of the principles we used when specifying is that it should be possible to jump into any one point in time and start rendering from there. That's what the fragmentation spec has done (turn them into fragmented files).
  • Probably should
  • vtt in mp4 is different from segmented in hls because there's more explicit support for repeating

each frame includes all the data.

  • vtt in webm the metadata (like the kind etc) is also repeated in webvtt header so that can be referenced and the region specs can be put in there as well. That's been taken care of.

Vertical captions

  • We've got a spec for all of the cue definitions that allow for horizontal of vertical rendering. The only thing * we haven't done vertically is regions. Regions are only horizontal. Apparently there's a situation where this happens in Japan.
  • This should be addressed in V2
  • Ruby support and internationalization are at least as supported as the browser. Not WebVTT's fault because we rely on other stuff from the browser. However, webvtt has no support for pre-rendered images, so webvtt is beholden to the browser to have good international captions.
  • Move the web forward rather than add things to webvtt.
  • IMSC1? - this is a shared issue, because until CSS is fixed
  • NOTE: WebVTT regions are NOT TTML regions!! WebVTT regions are about smooth scrolling.
  • Regions in TTML are CUEs? in WebVTT.
  • Vertical: Can't really say much because its just the web.
  • CSS: There's a new selector that just got introduced that applies css to regions. There's a region selector with cue region or somesuch. One bug we still have to fix where we didn't specify which css features are applied to that (but its exactly the same as cues), and that's new and nobody has applied that.
  • From HTML you can override the rendering of the cues. We also have a style specification so you can include styles inside webvtt files.

How hard is it to implement webvtt without a browser.

  • Spec is underdefined on this. Doesn't specify enough of the baseline.
  • Ignore and try to match italic/bold/underline and throw away everything else.
  • If its not formatted correctly ignore it.
  • To add to that - we have the same problem (at google).
  • We've not payed attention to this. No nice list of things that a non-css supporting
  • basic styling is independent of css. All that stuff is there. In theory you should be able to get away without a css engine. You can't get color without CSS. Or opacity. Some basic colors that match the
  • Ignore external css. what about minified css? No minification at the moment. People will do it.
  • Right now there are 5-10 attributes. What happens when you add 100s more and it renders correctly in firefox.
  • SRT was simple.
  • People will say "your css isn't enough"
  • No. The browsers that have supported webvtt filter out other things.
  • The list is small, and specific, and nothing else should apply.
  • One of the open issues are what are the requirements of non-css supporting renders comment on the github issue!
  • How to parse CSS is underdefined. Because we need to write a CSS parser to do that. Maybe a few regex's will solve my problem.
  • How to parse the style section is currently underdefined and minification is underdefined.
  • What about !important. CSS can be insane, what subset to we need to support.
  • There's a bug on the spec that external important and other things are NOT allowed.
  • Colors have a bug please find it and contribute.
  • You must behave as if the following stylesheet were embedded in your client.
  • "That would be good!"
  • You can achieve that by embedding, non-css you must behave as if it were there.
  • Encourage folks to get involved in the mailing list for things like time-aligned metadata and chapters is exactly what we need for version 2.

iOS issues

  • Is it about what features are supported in fullscreen?
  • Apple implemented the spec around 2 years ago, and not a lot of features have come in. Regions are there.

Final notes:

  • Livestreaming is another discussion.


  • Please help and comment to figure this out. Even if its just "works for me" or +1, that shows a community who cares.
  • Including vlc.
  • VLC has 608/708 and part of IMSC1? actually working. Implementing WebVTT is not difficult because we're awesome we could do it but we just need to understand the requirements and why they're in there.
  • The rest should be doable.
  • VLCKit? works in fullscreen on iOS.