Captions

Interests of the group:

  • Encoding metadata - there are ML models that capture metadata, want to encode as part of WebVTT to provide that to the player, e.g. Shot detection
  • Firefox: WebVTT is big, working on which parts of spec to support first
  • Live captioning formats - How does WebVTT work for this? What other options are available? How can reduce latency between captions and audio?
  • Caption interoperability between formats?
  • Multi-language support
  • TextTracks? API improvement in the browser. How can we improve rendering in the browser?
  • Use case investigation for TextTrack? API
  • Rendering support for foreign languages, vertical text
  • TTML
  • Native vs. Custom Player preferences
  • How to automate caption workflows - Speech to Text, OCR?
  • IMSC1? feedback (spec. editor present)
  • CSS support for subtitles and captions

Are Polyfills OK?

  • Concern: Multiple polyfills, not everyone uses the same one
  • Native playback (fullscreen) prevents polyfills in some browsers (iOS)
  • Ideal: Browser should do work. Players being different doesn't work across the board, not sure how to have UX consistent across browsers though.
  • VTT.js polyfill is a dead project
  • There are many libraries doing WebVTT support in a rudimentary fashion

iOS setting the standard?

  • ISMC1? is supported in iOS 11. Also supports WebVTT in HLS playlist, and embedded 608/708 SEI metadata
  • Native support is the only option for iOS because polyfill can't cover fullsreen playback
  • Safari is the furthest ahead in WebVTT support (only missing a few features)

Strategy?

  • iOS - use native. Everyone else, use what you want (polyfills)
  • Will always need a polyfill for supporting IE
  • Edge does not support VTT-Q (only text tracks)

FCC and European requirements?

  • Basically the idea is that if any content is broadcast then brought online, it must have functionally equivalent set of captions. But the FCC says SMPTE-TT (SMPTE 2052) is a safe harbor but not supported by anyone.
  • Most of the '708' data they see is actually 608 (708 compatibility bytes generally, since it's cheaper to create them this way)
  • Note that online captions must be functionally equivalent to broadcast. i.e. Roll-up animation are needed. This is something a lot of people are running into. Some implementations of this are poor with only one or two rolls, need a smooth scroll (not simply swapping text between fixed line positions)
  • In several cases it's cheapest to create roll-up captions than to create paint-on captions

Text Track API improvements

  • Problem #1: IE doesn't support it
  • Spec requests:
    - Ability to remove a text track. Now that players are taking ownership of creating text tracks this is needed when switching between video elements using MSE. Current ability to clear all tracks is not sufficient, need to remove individual ones. (Or at least, remove() should signal that it's not something to parse)
    - Extend tests for TextTrack? API (note: WebVTT tests have come along a lot but not the TT API)
  • Browser requests
   - AddTrack?, RemoveTrack? or TextTrack? Change doesn't work in Edge; if using native HLS playback they aren't reliable
   - Inability to position or style captions in browsers (Apple does most of the styling and positioning with roll-ups, ruby/glyphs/vertical captions)

VTT Live Captions?

  • WebVTT has lost its editor; looking for a new editor with energy to move this forward
  • New spec text is required for live captioning. Currently broken-up segments are supported for live, but does not support the ability to update TextTrack? items (e.g. backspace)
  • Providing a line of text when it's complete may be OK for streaming where you can introduce a delay

IMSC1? support?

  • Already supports live captioning; BBC does live TTML broadcast, so does IRT
  • Also does not support updating items; prescription right now is to update the same text track item over and over as it evolves

Discussion about live captioning approach

  • Send individual characters (including backspace) - compatible with 608/708 but harder to implement in browser, harder to construct 'final' line of captions
  • Alternatively have authoring tool send updated lines during creation and mark a 'final' line at the end