QoE? (Quality of Experience) Implementation


--- General Summary

Understanding the challenges around tracking metrics today

  • Unexplained major gaps in timings
  • Trusting events coming out of the video element
  • Rebuffering - tracking the right way

Additional Useful Metrics

  • Interaction with the manual bitrate selector
  • Time in fullscreen
  • Downscaling
  • A/V Sync would be nice

Followup Topics

  • Standardizing Video Viewing session -- the states and events (blog post?)
  • Explore Media playback API -- submit feedback/ideas
  • Identify opportunities for the browser exposing media performance details


Topics of interest:

  • Standards
  • How metrics are being utilized for video
  • How to use metics to infer quality of experience
  • Browser standards: video element events aren't consistent on all browsers
  • how do we make sense of these numbers?
  • Rebuffering: how do we define what is a good or bad experience
  • How do you make sense of quality across different video formats
  • Time to first frame: how do we take into account all the factors that affect this
  • how smooth is video playback? We don't have a automated way to do this
  • Upscale percentage
  • Performance Timing
  • Javascript long task would be nice on other browsers other than Chrome
  • Users abandoning sessions
  • What's the most important metrics to us as an industry?
  • "What is a play session?"
  • Should we be looking at pages with multiple players?
  • Browser app suspension

Implementation Issues:

1. Browser app suspension:

  • Client clock times don't match UTC/server times
  • This mostly happens on mobile
  • Can we capture some duration instead
  • e.g Apple TV
  • Attemps to play may not actually be the real start of video playback
  • Tracking plays via attemps, and action and the difference in time for that
  • this is part of start up time
  • what about ads?
  • try "race condition" your ad implementation by trying to load the content first then loading the ad
  • you could measure the difference between requesting the ad and playing it as the start up time
  • but these are different questions
  • ads should have a different set of events
  • you could create new events for certain loading points e.g when the player physically loads
  • you could split this into two groups: those without ads and those with ads
  • the measurements should be set up to answer as many questions as possible
  • issues around start up, autoplay, e.c.t
  • we should take the user expectations into account and not tracking metrics when the user has not requested it e.g startup

2. How trustworthy are these events?

  • Safari triggers the playing event, but 250ms later the first frame is seen
  • if you poll currentTime, you can see this
    - we could create a standard that requires media events to be fired at certain times, but browsers have restrictions around the javascript thread being separate from the video thread that cause delays
     - performance api(Chrome): the time of the first paint is being worked on. It would be nice to know what the first "meaningful" paint is
     - whether there was a significant amount of the page that changed
  • performance marks are defined
  • could there be a type for media events
  • could you have an observer for a specific type?
  • this is global scope, it is not tied to a specific media element
  • it would be nice to have a duration value in the playing event. The browser could measure the time between play and playing and report that back. The same could be done for seeking as well.
  • how would you know what that duration value is related to?
  • time is an important factor to a lot of these metrics
  • e.g The client clock jumps back to the year 2000 and back to normal

3. A/V sync

  • is there a way to track A/V sync?
  • this is mostly manual now
  • there are APIs? to sync to a different video element, but not making sure they are aligned
  • the MediaController? API doesn't necessarily sync streams. It could fit into the Media Playback API.
  • There's a playback quality spec: https://wicg.github.io/media-playback-quality/
  • It's only focusing on dropped frames, but it's a good starting point to build off for other metrics

4. Play Session Standardization

  • Proposal: some model of how you look at what users are doing with media on the page
  • what are the events starting from navigation -> abandonment
  • what are they watching?
  • what did they click on?
  • having standards would help browsers create models that make sense for player implementors
  • standardizating the schema of how events should happen in each state
  • a model that can be used to take a pipeline of events and combine them into a smaller set of aggregate values that are useful
  • what can you throw away?
  • storing data is costly
  • metrics for video are drawn around video states and state transitions
  • there's value in understanding how the user got to the video, and what they try to do next(especially for recommendations)
  • maybe a blog post would work for this

5. Upscaling

  • how critical is this?
  • distribution vs consumption?
  • what's the first question you would ask if you do see a change in these values?
  • Mux: introduced it because bitrate is a poor metrics for quality because it varies by content type.
  • this is meant to be a counteractive metric to measure against another metric
  • this was an attempt to take into account resolution and bitrate
  • how do you compare the scenarios?
  • when does upscaling start to actually affect quality?
  • "how fast does your player upscale?" is a question customers ask

6. Downscaling

  • to see if you might be wasting bandwidth
  • this might be an indication that something is wrong with ABR algorithms
  • most ABR algorithms will try to upscale
  • this works well as a cost metrics

7. When users are moving away from using ABR

  • if you default to ABR and you see the user manually select a different bitrate
  • this could be a more important metric for success for an ABR algorithm
  • how long it took is a different question
  • e.g Youtube: did an experiment where the qualities were measured as "auto" and anything else was a fail
  • easily graphed and monitored
  • manual bitrate selection is fading away
  • ABR should do a better job than a human should be able to manually considering enviornmental factors e.g bandwidth
  • some customers want better quality but also a fast start time - but what are the users seeing that make them think it is slow?

e.g by the time the fullscreen button is pressed. You could start monitoring this in your ABR algorithm

8. Fullscreen

  • Consumption time
  • It's not an actionable metric
  • it's a property on other metrics
  • frequency of going in and out of fullscreen may help

9. Rebuffering

  • instead of looking at it as time spent rebuffering, tried to bucket it by a window e.g per second
  • (total rebuffering / total watch time) over all sessions helped deal with outliers
  • this is a metric that is susceptible to outliers
  • background tabs were an issue: you may not get all the events in the session
  • requestAnimationFrame was useful with Firefox where there were delays in the Javascript running with setTimeout/setInterval
  • sometimes users switch tabs and the session resumes and events fire
  • Firefox: shuts down the video decoder when the video loses visibility. To the player, it looks like it's still playing. When the tab gains focus, the video is resumed as best as possible.
  • "stalled" event is the network event
  • "waiting" event is the signal that the video stopped. This should be a reliable event to tell if the ready state has gone down, but you can see it before playing.