Over the past years, camcorders have excelled in capturing video, but the quality of the audio remains lagging. Adding an external microphone is a must, but will only get you so far. Quiet voices will barely rise above the noise floor, while it feels like the videographer yelling in your ear.
This article describes a three stroke approach to improve sound from camcorders. The first step is to reduce the noise. In the following steps the dialog loudness is set to a standard level and the dynamic range is reduced. The last two steps are an integral part of AC-3 (A/52, Dolby Digital) but can also be applied to other streams such as Advanced Audio Codec (AAC) as shown in the last section.
First things first. Start with a good external microphone and a wind muff. Personally, I use a Canon 5.1 microphone because it is small and mounts on the hot shoe. The specs are not that impressive, but even an ideal microphone will pick up a significant amount of ambient noise.
The noise level is defined as the loudness perceived by the human ear. The loudness is commonly measured using a frequency filter that mimics the human hearing and then measuring the energy as root-mean-square (LAeq). Note that for a sine wave the RMS level is √2 × peak level. Another popular measure is the K-scale for which there are several plugins such as mzuther‘s, DPMP, and meterplugs.
Personally, I take a shortcut by not applying the filter and simply measuring the RMS level of noise samples. For this, I use the stats function from Sound Exchange. In practice this appears to be good enough. My typical indoor measurements are shown in the table below. The values in the table are expressed at decibels below digital full-scale.
|quiet dialog||-41 dBFS|
|normal dialog||-31 dBFS|
|loud dialog||-21 dBFS|
With the quiet dialog level this close to the noise floor, we can’t amplify this dialog without also significantly amplifying the noise. As a first step we lower the noise floor in post using an audio editor such as Adobe Sound Booth. This usually reduce the noise by about 8 dB without causing significant distortion.
|quiet dialog||-41 dBFS|
|normal dialog||-31 dBFS|
|loud dialog||-21 dBFS|
The volume dial on AC-3 decoders controls the loudness of a normal dialog. The listener is thus able to reliably set the volume level of the dialogue no matter what program is playing. This does however require the audio to indicate the normal dialog level at which it is recorded. In AC-3 this is accomplished using the metadata tag “dialnorm” as described in the AC-3 specification §22.214.171.124 and §7.6. The decoder uses this parameter to automatically adjust the volume so that the dialog is always played at the same loudness.
Dialog loudness expresses how an average person perceives the volume of a dialog [DD1; DD2].
The film industry has long standardized the normal dialog level at -31 dBFS. The corresponding Sound Pressure Level (SPL) for most movie theaters is 85 dB.
At home we have a volume dial that selects the SPL for normal dialog. Assuming the volume dial is set to a sound pressure level of 67 dB for normal dialog. When this decoder plays a stream with a normal dialog level of -31 dBFS, it will adjust the amplifier gain so that 0 dBFS ≡ 98 dB, what causes the normal dialog to reproduce at -31 dBFS ≡ 67 dB. If the listener switches to another program with a normal dialog at -25 dBFS, the amplifier gain will be reduced so that 0 dBFS ≡ 92 dB, and the normal dialog stays at a 67 dB.
Dynamic Range Compression
The sound from camcorders often has an undesirable large dynamic range. On one end, there may be people whispering in the distance, while on the other end we may have the videographer holding the camcorder. The difference between these loudness levels in called dynamic range. We can reduce the dynamic range by both amplifying the quiet sounds and attenuating the loud sounds. Note that to successfully boost the quiet sounds, it is essential that they are well above the noise floor.
With AC-3 we can adjust the gain using the metadata tag “dynrng” as described in §7.7 of the AC-3 specification. The AC-3 encoder generates these tags based on the loudness combined with a compression profile. All standard profiles have a null-band that is centered around the normal dialog level. Loudness levels within this null-band are left intact. For the film compression profile at a normal dialog level of -31 dBFS, the transfer function can be visualized as shown below.
As shown in the graph above, signals from -33.5 to -28.5 dBFS stay the same. The 12 dB below is boosted, while the 10 dB above is attenuated as also shown in the table below.
This film at dialnorm -31dBFS profile is a good fit for my camcorder recordings. The quiet dialog is more or less in the middle of the boost range, the normal dialog is at the normal dialog level, while the loud dialog is in the early cut range.
|source material||noise reduced||range compressed|
|noise||-43 dBFS||-51 dBFS||-45 dBFS|
|quiet dialog||-41 dBFS||-41 dBFS||-37 dBFS|
|normal dialog||-31 dBFS||-31 dBFS||-31 dBFS|
|loud dialog||-21 dBFS||-21 dBFS||-25 dBFS|
The table above shows that the soft dialog is amplified by 4 dB, but at the cost of raising the noise level 2 dB. If we had a constant background noise level, we could consider alter the transfer function so that it attenuate all signals below -45.5 dBFS.
Let’s do it
As shown above, applying dialog normalization and dynamic range is straightforward using the metadata in AC-3. For other audio streams, we will need to alter the samples to get the same effect.
ffmpeg -y -vn -i in.avi out.wav aften -v 0 -dnorm 31 -dynrng 1 out.wav out.ac3
The Advanced Audio Codec (AAC) doesn’t support the metadata for dialog normalization or dynamic range compression. Instead, we modify the samples directly using SoX as shown below.
Note that in this particular example it also applies a 13 dB overall gain. The values printed in bold represent the compression curve. The resulting stream is then compressed using Nero’s AAC encoder.
ffmpeg -y -vn -i <em>in.avi</em> <em>tmp.wav</em> set CURVE=<em><strong>-90.0,-84.0,-45.5,-39.5,-33.5,-33.5,-28.5,-28.5,-18.5,-23.5,0.0,-22.6 </strong></em>sox <em>tmp.wav</em> <em>out.wav</em> compand 0.1,3.0 %CURVE% 13.0 -90 1.6 stats neroaacenc -lc -br 226000 -if <em>out.wav</em> -of <em>out.aac</em>
Those who like to experiment themselves find the transfer curves for other profiles at various dialog levels.
- film light
-31: -90.0,-84.0, -53.0,-47.0, -41.0,-41.0, -21.0,-21.0, -11.0,-16.0, 0.0,-15.5 -28.5: -90.0,-84.0, -50.5,-44.5, -38.5,-38.5, -18.5,-18.5, -8.5,-13.5, 0.0,-13.1 -27: -90.0,-84.0, -49.0,-43.0, -37.0,-37.0, -17.0,-17.0, -7.0,-12.0, 0.0,-11.7
-31: -90.0,-84.0, -45.5,-39.5, -33.5,-33.5, -28.5,-28.5, -18.5,-23.5, 0.0,-22.6 -28.5: -90.0,-84.0, -43.0,-37.0, -31.0,-31.0, -26.0,-26.0, -16.0,-21.0, 0.0,-20.2 -27: -90.0,-84.0, -41.5,-35.5, -29.5,-29.5, -24.5,-24.5, -14.5,-19.5, 0.0,-18.8
-31: -90.0,-74.8, -52.5,-37.3, -33.5,-33.5, -28.5,-28.5, -18.5,-23.5, 0.0,-22.6 -28.5: -90.0,-74.8, -50.0,-34.8, -31.0,-31.0, -26.0,-26.0, -16.0,-21.0, 0.0,-20.2 -27: -90.0,-74.8, -48.5,-33.3, -29.5,-29.5, -24.5,-24.5, -14.5,-19.5, 0.0,-18.8
- music light
-31: -90.0,-78.0, -65.0,-53.0, -41.0,-41.0, -21.0,-21.0, -21.0,-21.0, 0.0,-20.0 -28.5: -90.0,-78.0, -62.5,-50.5, -38.5,-38.5, -18.5,-18.5, -18.5,-18.5, 0.0,-17.6 -27: -90.0,-78.0, -61.0,-49.0, -37.0,-37.0, -17.0,-17.0, -17.0,-17.0, 0.0,-16.2
-31: -90.0,-78.0, -57.5,-45.5, -33.5,-33.5, -28.5,-28.5, -18.5,-23.5, 0.0,-22.6 -28.5: -90.0,-78.0, -55.0,-43.0, -31.0,-31.0, -26.0,-26.0, -16.0,-21.0, 0.0,-20.2 -27: -90.0,-78.0, -53.5,-41.5, -29.5,-29.5, -24.5,-24.5, -14.5,-19.5, 0.0,-18.8