An adaptive bitrate switching algorithm for speech applications in the context of WebRTC
MetadataShow full item record
This item's downloads: 188 (view details)
Web Real-time Communication (WebRTC) provides a set of novel technologies and standards that enable high-quality audio, video, conference calls, and arbitrary data exchange over web browsers. It enables peer-to-peer multimedia sessions over IP networks without the need for additional plugins. The Opus codec, which is deployed as the default audio codec for speech and music streaming in WebRTC, supports a wide range of bitrates. This range of bitrates covers narrowband, wideband, super-wideband up to fullband bandwidths. Users of multimedia applications always demand high quality audio. In addition to users' expectation, their emotional state, content type and many other psychological factors; network quality of service (QoS) such as latency, jitter and packet loss as well as distortions introduced at the end terminals could determine their quality of experience (QoE). To measure the quality experienced by the end user for voice transmission service, the E-model standardized in the International Communication Union (ITU-T) Rec. G.107 (a narrowband version), ITU-T Rec. G.107.1 (a wideband version) and the most recent ITU-T Rec. G.107.2 extension for the super-wideband E-model, can be used. In this thesis we first derive two codec-specific factors known as the equipment impairment factor Ie and packet loss robustness factor Bpl for all the bitrates and operating modes supported by the Opus codec in speech applications. The derivation of such factors makes it possible to assess the quality experienced by the users in RTC applications including WebRTC using the E-model. Moreover, any other speech application deploying the Opus codec could also benefit from the derived factors. We then present a real-time adaptive codec parameter switching algorithm to select the most optimum Opus codec configuration under the present network conditions, based on the Mean Opinion Scores (MOS) coming from the SWB, WB, and NB E-model. We carried out a number of experiments in order to check the feasibility of utilizing the E-model in WebRTC speech applications. The first one quantifies the impact of coding using the Opus codec for speech applications. We followed the methods standardized by the ITU-T for this purpose. The second experiment validates the QoS metrics collected by WebRTC getStats API. The last experiment inspects the impact of packet loss and the quality improvement when using the adaptive codec parameter switching algorithm presented in this thesis.