Adobe Flash, once the de-facto standard for media playback on the web, has lost favor in the industry due to increasing concerns over security and performance. At the same time, requiring a plugin for video playback in browsers is losing favor among users as well. As a result, the industry is moving toward HTML5 for video playback.
HTML5 video playback development is still nascent, and it was initially supported by browsers in its simplest form. Only recently has support been expanded to include capabilities for adaptive streaming. Adaptive streaming offers two key benefits:
- Adaptive Bitrate (ABR): An algorithm that detects a user’s bandwidth, CPU capacity, player size, etc. in real time and accordingly adjusts the bits downloaded to stream video.
- Variable buffer sizing: A capability that allows us to control the time it takes for the playback to start.
Without ABR and variable buffer sizing, users have a poor viewing experience, as video playback cannot adapt to changing conditions on a user’s device.
At Yahoo, our video player uses HTML5 across all modern browsers for video playback. In this post we will describe our journey to providing an industry-leading playback experience using HTML5, lay out some of the challenges we faced, and discuss opportunities we see going forward.
The First Steps Toward HTML5
We took the first step in our HTML5 journey in October 2015 when we globally live-streamed a regular season NFL game for the first time. For the event, we deployed a « pure » HTML5 player on Safari; this was based on the native HTML5 support in the browser for HTTP Live Streaming (HLS). As part of that effort, we built the capabilities in our video player to allow different video rendering techniques based on the client environment (viz. browser, Flash support, device configuration, OS, etc.).
In order to broadly support HTML5 video on all browsers, we needed to re-architect the way our player streamed video. This presented a number of choices, all of which could impact Yahoo’s business and the user experience.
The first and probably most critical was to determine the streaming protocol to support. The choice was between HLS and DASH, both of which support Adaptive Streaming over HTTP. However, in order to maintain a simple serving stack and get to market quickly, we decided to support HLS. To support iOS, we would have needed to support HLS anyway, and as the standards evolved, Media Source Extensions (MSE) could be made to work with HLS. MSE is a recent advancement to HTML5 standards that allows the dynamic generation of media streams for playback via the video tag.
Our next decision was whether to build, buy, or leverage open source for an HTML5 player. Yahoo was not alone in its desire for an HTML5-based player, and a number of open source options existed. Leveraging one of these players would jump-start our effort. However, analysis, including field tests of the in-market players, confirmed that the available players would not provide the quality, performance, and scale that we expected out of the Yahoo Video Player.
As you will see below, all of the design decisions provided us great benefits.
Zooming into the Future
With decisions determined, we set out to write a player that would remove our dependency on Flash to play video. The project was codenamed « Zoom, » after the arch-nemesis of DC Comics’ superhero, the Flash.
The media pipeline of the player for HLS streaming would look as depicted in Figure 1 below. The player demuxes the incoming transport stream (MPEG-TS) into audio and video parts that are then packaged into the fragmented MP4 format that is understood by the MSE layer in the browser.
Figure 1. Media pipeline for HTML5 content
We designed the new HTML5 player with a few goals in mind. It was to be the following:
- Modular: Each component could evolve separately and could be tested independently.
- Extensible: The new player would have the ability to support new features (e.g., DASH) without a redesign.
- Stateless: We would use components (like ABR) across multiple player instances on a page or app.
Figure 2 below shows the high-level architecture of the new HTML5 MSE-based video player.
Figure 2. Architecture of Yahoo HTML5 Player
Framework Services provide common capabilities like HTTP Loader (for loading video assets), Web Workers (for multithreading), and Bandwidth Estimator.
Stream Media Services include services that deal with the various stages of the media pipeline shown above. This includes loading the transport stream, demuxing and packaging into MP4 that can be played using MSE.
Streaming Controller is the component that manages the video content streaming. It is also the component that consults with the ABR Engine to decide the right bitrate to download.
Playback Controller is the component that orchestrates the video playback using the various modules. It maintains a state machine of the various states the playback can be in. It also provides the APIs to play, pause, seek, etc.
First, we were moving from a single framework (Adobe Flash), which provided a consistent environment across browsers, to multiple frameworks (MSE, XHR, Web Workers, HTML5 Media Elements) on diverse platforms and browsers (Chrome, Firefox, IE, Edge, Safari, etc.), each of which added its own quirks to the system.
The second challenge involved advertising. While content video playback has shifted to HTML5, most video advertisers continue to rely on Flash. As such, we had to find a seamless way to serve ad content on Flash while leveraging HTML5 for content playback. We built our player to use different renderers—the component that uses a given rendering technique (Flash, HTML5, etc.) for video playback—while maintaining a seamless experience. This allowed us to give users the optimal experience while protecting ad revenue.
The third challenge was finding ways to increase user engagement, a key metric for the success of video consumption. We wanted users to engage with video continuously without any action that required their initiation, like a click or additional page load. And, at the same time we did not want the experiences (e.g., the page) to implement continuous playback on a per-experience basis. Hence, we decided to make « playlists » a first-class API on the video player itself, where we could program a curated list of videos that are highly contextual and personalized.
Figure 3 below shows the high-level architecture of the Yahoo Video Player.
Figure 3. Architecture of Yahoo Video Player
Controller is the component that manages the switching of renderer and exposes the various playback functions.
Ads Controller is used to manage and play video advertisements.
Playlist Manager manages the playing of a video playlist and exposes the playlist functions.
Performance (viz rebuffering and startup time) is a key driver for user engagement. Making performance-related changes raised a number of obstacles.
Audio/video demultiplexing and MP4 packaging are CPU-intensive operations. If these operations are performed in the main UI thread of the browser, it affects the UI responsiveness of the page and the player. Fortunately, browsers provide Web Workers to facilitate multi-threading, but using them requires message passing between threads.
Our experiments revealed that using a worker for demuxing and MP4 packaging was 20% more efficient in Firefox (vs. not using a Web Worker). On the other hand, we discovered that the overhead of inter-thread message passing is high in IE and Edge, resulting in a higher re-buffering ratio. To overcome these challenges, we added two new design elements:
- Execute processing units inside a worker
- Minimize inter-thread messages
The effective use of Web Workers for media transformation gives the Yahoo HTML5 player a distinctive performance edge over other players. These enhancements resulted in a 10% – 20% improvement in CPU and 30% improvement in re-buffering ratio.
While we faced a number of challenges, building a redesigned player in-house gave us the opportunity to introduce features that were not supported in the previous player.
Since we built the capability of switching renderers to support ads using Flash and to support content using HTML5, we were able to build pre-loading capabilities. That is, we can preload the next item in the playlist before it is actually played. For example, once an ad is loaded and starts to play, we can preload the content in the background, thus making for a TV-like transition between ads and content.
We also improved our bandwidth estimation algorithm. Previously, it was purely based on timing the rate of content download. We augmented it with the information we get from the resource timing APIs like TTFB (Time To First Byte).
We also introduced a feature that enabled us to switch bitrates at key frame boundaries. This improves our ability to react to sudden changes in network conditions.
We started the rollout of the new HTML5 player on Google Chrome and have been adding support for more browsers over time. We now have the new HTML5 player running on all modern browsers. Figure 4 below shows video views based on which renderer is used for playback. We use HTML5-based rendering for approximately 70% of our desktop traffic now. This number is set to increase as we complete the roll out of our player across the complete Yahoo network. The most prominent platform that does not support MSE is IE on Windows 7, which will continue to be served via Flash.
Figure 4. Video views based on renderer used
On the important quality metric of rebuffering ratio, the HTML5 player is trending at par or better than our Flash player (Figure 5).
Figure 5. Rebuffering ratio – Flash vs. HTML5
The HTML5 player excels over the Flash player when it comes to the video start time once the user input to play is received. Figure 6 shows the latency between the user click and the rendering of the first frame of the video (Click To Play Latency) of the Flash and HTML5 based players.
Figure 6. Click To Play Latency – Flash vs. HTML5
Advantages of the HTML5 player such as faster load time, better efficiency, etc. are well-reflected in these results.
The Road Ahead
Adaptive streaming on the internet is rapidly evolving. While the industry is improving the playback experience in the context of a single player, at Yahoo, we are also optimizing for video streaming performance of multiple videos on the same page. We are also working to bring our MSE-based HTML5 player to mobile web.
Apple recently announced support for fragmented MP4 as a transport stream in HLS, a decision that aligns well with our strategy to stick with HLS. This gives us three wins:
- Simplifying the player since fragmented MP4 is natively compatible to MSE.
- Improving the player performance by avoiding the CPU-intensive demuxing and MP4 packaging steps.
- Reducing the bandwidth usage for similar content and consequently improve playback quality and startup times.
We remain focused on advancing the state of the art of video streaming on the internet, and we’re hiring! Email me at [email protected] and we can talk about opportunities on our team.