Dolby Audio Overview
Dolby Audio enables the means to deliver high quality audio experiences to consumers. Dolby has created audio codecs and systems for the delivery of mono, stereo, 5.1, 7.1 and Immersive Audio to consumers for playback in mobile devices, game consoles, laptops and home theater.
Dolby Atmos is a content creation, encoding, and delivery technology that provides immersive audio to consumers. Dolby Atmos uses audio objects to add an elevation to normal multi-channel audio. The Atmos system renders an Atmos mix appropriate to the listeners device and/or speaker configuration. Atmos is not an audio codec, but can be delivered by multiple codecs. Read more in our Dolby Atmos Tutorial.
Dolby audio codecs provide substantial bitrate reductions, which allows for the delivery of audio by broadcast, streaming and fixed media at a fraction of the bandwidth and file size of uncompressed audio. Dolby codecs as a group are lossy/perceptual codecs, meaning that the codecs use patented psychoacoustic algorithms and processing to achieve bitrate reduction by removing redundant audio content that is imperceptible by the human ear. Dolby audio codecs provide near uncompressed audio quality at low datarates.
In addition to audio codecs for delivering to consumers, Dolby has a professional audio codec for use in professional contribution and post production – Dolby E.
The other feature and benefit of Dolby Codecs is that they carry metadata along with the compressed audio. This metadata is authored during the encode and used by consumer decoders to achieve optimal playback results for the device as well as listener preferences.
In conjunction with Dolby audio codecs, an older technology is sometime utilized called matrix encoding. Matrix encoding sums channels together and uses audio phase differences to encode and recreate discrete audio channels on decode. These include Pro Logic II which is used to matrix 5.1 to stereo and Surround EX which is used to matrix rear surround channels into side surround channels. Stereo that has been matrix-encoded from 5.1 is referred to as Lt/Rt (left total/right total).
Dolby Consumer Codec Overview
Dolby Digital (AC-3)
Dolby Digital was originally used for cinema applications with the audio encoded on the film itself. Dolby Digital later became the standard for DVD and Blu-Ray, ATSC and DVB terrestrial broadcast, cable, satellite and streaming services. It is still in wide usage globally.
Dolby Digital bitrates for 5.1 audio are minimum of 384kbps which represents an 18:1 compression ratio at near uncompressed quality.
Dolby Digital bitrates for stereo are minimum of 128kbps. Stereo can be downmix matrix-encoded from 5.1.
Dolby audio metadata is carried along with the audio payload.
Dolby Digital Plus
Dolby Digital Plus was introduced in 2005 and has been steadily improved. Dolby Digital Plus added 7.1 channel encoding, improved coding efficiency and features such as descriptive dialog tracks for the visually impaired in the same bitstream.
Dolby Digital Plus with 5.1 audio bitrates are as low as 192kbps, i.e. half that required for Dolby Digital.
Dolby Digital Plus has seen widespread adoption and is in many broadcast standards and is also used for streaming services. Decoders are present in laptops, tablets, phones, TVs, DMAs, STBs, etc.
Dolby Digital Plus JOC
Dolby Digital Plus JOC is a 5.1 Dolby Digital Plus bitstream with the addition of specific metadata to carry Dolby Atmos. The 5.1 “core” contain all the audio present in the Atmos mix and is backwards compatible with non-Atmos use cases.
There are two types of Atmos specific metadata included in the bitstream:
- JOC (joint object coding)
- OAMD (object audio metadata)
When Dolby Digital Plus JOC is received by a Dolby Atmos enabled device, this metadata is used by the decoder to extract object audio. It’s then used by the Atmos renderer to recreate the Atmos mix appropriate to the consumer’s device and/or speaker configuration. Read more about this process in the Dolby Atmos section.
As Dolby Digital Plus JOC carries additional metadata the bitrate is increased. The minimum is 384kbps but 448kbps and above are commonly used.
Dolby Professional Codecs
Dolby E is a professional audio codec used for contribution from live events and for post production. It was originally designed to get around the track limits on video tape formats. It is also a lossy codec but at a far lower compression ratio.
With Dolby E, up to 8 channels of audio can be compressed to occupy an AES pair which can be embedded in an SDI stream or stored in mezzanine containers i.e MXF and TS. The 8 channels can be in a variety of program configurations: 8 x mono, 4 x stereo, and commonly 5.1 + stereo. Each program contains its own metadata which is used when encoding to a consumer codec.
Dolby E is also designed to be editable in a professional context so the databurst is frame aligned with industry standard framerates. Linear uses of Dolby E include encode and decode latency of a video frame, depending on the framerate used.
Dolby E is used in many operator delivery specifications as it provides a way to manage multiple multi-channel presentations via existing infrastructure.
Note: Dolby E is not supported in Hybrik at this time
Dolby Audio Metadata
Dolby Codecs carry metadata along with the compressed audio. This metadata is authored during the encode and carried alongside the audio. It is used by consumer decoders to achieve consistent and optimal results between different content, on a wide variety of devices and to accommodate listener preferences.
Dolby metadata does not alter the source audio but provides instructions to the downstream decoder.
There are several types of metadata parameters that are carried along with the coded audio. We’ll cover some of the commonly used parameters, which fall into two categories:
For a full guide on Dolby metadata, read Dolby’s A Guide to Dolby Metadata whitepaper; a summary follows.
This, along with LFE ON/OFF, tells the decoder how many channels are encoded. The number of channels is expressed in the number of front/rear, i.e.:
- 3/2 LFE On indicates 5.1
- 2/0 LFE Off indicates stereo
Used to maintain consistent perceived loudness between different content – program segments, ads, changing channels, switching between streaming services, disc playback, etc.
- Dialnorm is defined as the level of normal speech relative to full scale digital (dBFs), i.e. 0dBFs. The Dialnorm metadata value is set during the encoding process so that it corresponds to the average level of measured speech. For dialnorm to be properly set, the content must be properly measured and set automatically and/or mixed to a loudness target level.
Common Target Levels :
- ATSC A-85 loudness target levels of -24LKFS +- 2dB with True Peaks at -2dBFS
- ITU R128 loudness target levels of -23LUFS +- 1dB with True Peaks at -1dBFS
- Netflix loudness target level of -23LKFS
The Dialnorm Metadata value is between -1 and -31dBFS relative to fullscale and is the amount of attenuation applied on decode so that the average level of speech decodes at -31dBFS.
Different programs are mixed with different peaks and dialog levels. This image shows some typical variation in loudness from different kinds of programs alongside the normalized audio at the output of the decoder using Dialnorm set to -31 dBFs.
Dynamic Range Control (DRC)
Dynamic Range Control is used to adjust the dynamic loudness range of the content on playback to fit the capabilities of playback systems and listener preference. Essentially, low level signals are boosted, mid level signals are left untouched, and high level signals are compressed - thus reducing the overall dynamic range.
Dynamic Range Control are gain words written into the bitstream that can be used to restrict the dynamic range by applying gain or attenuation on decode appropriate to the listening environment. These gain words do not alter the source audio, and the application of Dynamic Range Control can be completely defeated on higher-end decoders that have no need for them or applied selectively by the consumer i.e. “late night mode”.
Dialnorm and DRC interactions:
- Dialnorm ensures all audio content is output at the same reference dialog level
- DRC reduces the dynamic range of the content, with the dialog reference level as the “null” point
- Content below the dialog reference level will be boosted
- Content above the dialog reference level will be cut
- Dialnorm is the threshold for DRC
This image shows how DRC Profiles apply to the input signal
There are two modes of DRC - RF (more restrictive) and Line.
There are a few different profiles that determine how the gain words are generated:
Dynamic Range Profiles have a null band where no gain or attenuation is applied. The default profile is Film Standard, which applies the following adjustments:
- Max Boost: 6 dB (below –43 dB)
- Boost Range: –43 dB to –31 dB (2:1 ratio)
- Null Band Width: 5 dB (–31 dB to –26 dB)
- Early Cut Range: –26 dB to –16 dB (2:1 ratio)
- Cut Range: –16 dB to +4 dB (20:1 ratio)
If no profile is applied, no adjustments are made, but
dialnorm is still applied. The
dialnorm metadata value AND/OR
target loudness level are used to make sure the average level of speech or anchor element of the mix falls within the null band where DRC is inactive.
For more information on DRC profiles and settings, refer to A Guide to Dolby Metadata, page 9.
Downmixing is used to make sure the listener hears the appropriate balance between channels when listening to 5.1 content on a stereo or mono reproduction system. This allows the same 5.1 bitstream to be used by listeners with stereo or mono playback systems.
Many people who listen to multichannel audio content do so over stereo or mono loudspeakers. Automated downmixing built into every decoder constructs a stereo downmix from a 5.1 source and a further mono downmix from the stereo. One multichannel program can feed everyone.
There are three different types of downmixes:
- Stereo downmix
- Lo / Ro - Left only / Right only, for headphones or stereo televisions
- Mono downmix
- From Lo / Ro, for mono televisions
- Surround compatible downmix
- Lt / Rt - Left total / Right total, for Pro Logic or Pro Logic II decoding
Dialog normalization, dynamic range control, and downmixing all work together. An incorrect dialnorm metadata setting or not meeting production loudness target levels could affect the quality of the audio. The combination of DRC and Dialnorm prevent overload when downmixing.
dialnorm properly requires accurate Loudness Measurement. Loudness measurement uses algorithms that closely replicate the sensitivity of human hearing to frequency and directionality as well as level.