WebVTT beyond captions

Examples and Needs


  • Use the timeline search - search for "chicken" - success!
  • Thumbnails
  • chapters

Jeroen: three things I am interested in 1. preview thumbnails - make it a "kind" 2. chapters - can we put more than just text into chapters 3. how to put track inband into live streams

Preview Thumbnails


  • we're providing urls for images
  • or sprite them with media fragment URIs? (YouTube?, Hulu, Brightcove, JWPlayer?)
  • base64 encoded images as cue content
  • Apple trickplay: provides offsets for iframes

JW: Native implementation in players in browsers?

JW: lots of players are doing this nowadays

PJ: extracting iframes by browser is somewhat not useful

  • might get black frames at beginning
  • need to buffer all the video
  • prefer linking to images

SP: is the proposal to introduce a new @kind="thumbnails" track?

JW: possibly

PS: anything that goes into a img @src attribute

SP: do we need to do responsive images?

PJ: possibly - eventually

JN: we can easily extract the images from inband easily because we have offsets

  • I would like to introduce an inband version of a @kind="thumbnails" track

JN: let's make sure we introduce a kind="thumbnails" track asap because too many people need/use it

SP: are chapters with thumbnails the same as a thumbnail track?

JW: a thumbnail track will have a higher frequency of images

MD: Should the server just define an API for how to load the images?

JN: you want the images served ahead of time

MD: do ppl use <track> and <video>

JW: smaller companies are starting to really get on the bandwagon


  • we need a new @kind="thumbnails" track
  • should contain a URL that would be interpreted by an <img> element (or a <div> with the bg set?
  • if you want a binary blob, just us a data URI
  • if you want to use a sprite, use media fragment URIs? (good first use of spatial media fragment URIs? actually - not implemented yet)

Chapter Markers

JW: there is interest for more rich chapter content; in particular images

JW: we support this functionality where we have small text markers in the timeline

JW: Playlist with a title, text and image

PJ: could we just use a thumbnail track with a chapter track?

PJ: chapters have images that represent them - thumbnail tracks have images that are at that point in time

JN: if you click on the seek bar at a representative image, ideally you want to display the video from the point in time where the image is taken

PJ: I don't know how we deal with this in the browser in a native way

SP: I think we need a single "thumbnails" track that is always active

  • but the example that JW showed includes both a thumbnail and some descriptive text on top of the chapter title text

JW: I'd like to see a thumbnail be part of chapters; descriptive text not so much

Silvia: I think the display that JW has with images, title & descriptive text is a good example for a metadata track - it's displayed outside the video frame anyway

  • but if all browsers say that chapters always contain images, then we've obviously underspecified chapter tracks

JN: we always display thumbnails with chapter titles

PJ: if it's a real use case, maybe we should extend the spec

SP: should we include thumbnails into chapters?

PJ: le'ts experiment with thumbnail tracks first

  • general agreement that that type of track is more important

Inband tracks for live video

JW: how can we transport text tracks in live video?

  • in particular we have cue start times only with no end time - end times are determined by the start of the next cue; mostly for captions

SP: at one stage we had a proposal to introduce a special "NEXT" keyword for end times; then we can stream e.g. in-band in WebM?

PS: and use blank cues to get breaks

JW: it's a good proposal

PJ: how do we deal with chapters in live streaming?

SP: do they even make sense?

PS: for DVR, yes

JW: workflow - ppl want their content available as fast as possible

PJ: what to do for thumbnails for DVR? What if there is already containers that have in-band thumbnails? We should support that.

JN: I don't know if there is a standardised thumbnail track in MP4?

PJ: it seems weird to put a condensed version of the movie into the movie

MD: is there an action plan for live VTT?


  • NEXT keyword instead of end time would enable support of live captioning
  • need to support live VTT - needs a spec change to stop blocking loading until all cues have been loaded

Cross-domain VTT

  • VC: it seems like cross-domain loading of VTT is blocked in the spec, but no browser implements it
  • PJ: same-origin, cross-origin sandboxed, cross-origin DOM-exposed are the three available options
  • VC: yes, but nobody has implemented it


  • need to discuss at W3C?


  • JN: in-band metadata cues are not yet exposed to iOS to the browser
  • Sam: 2nd screen use cases; metadata in cues for DOM interaction
  • MD: live MPEG-DASH support ? We need data events to JS from text tracks from HLS.
  • Steve: Re-using Track / Cue infrastructure for advertising (VMAP).