Exchanging video between DaVinci Resolve and FFMPEG

In an ideal world, there would be a video Coder/Decoder (CODEC) that supports an alpha channel (transparency) (with full support in Blackmagic Design's DaVinci Resolve and FFMPEG. The DNxHR CODEC would be a great candidate, if only both DaVinci Resolve and FFMPEG would support the same bit depth with an alpha channel. This leaves us with:

  • For lossless source videos, the Apple ProRes is a good choice. DaVinci Resolve 17.1 can decode the format supporting an alpha channel, but not encode it. FFMPEG can decode and encode it with alpha. The odd times where an encode from DaVinci Resolve is needed, one can alway resort to the DNxHR CODEC.

  • For lossy source videos, H.264 remains a great choice. Both programs fully support it the CODEC. In the odd case where lossy video with transparency is needed one can use Google's VP9.

The details of this quest and FFMPEG encoding parameters can be found below.

Lossless

Lossless video compression is great for source and intermediate material in video editing. Lossless video compression makes the video files smaller to work with while not loosing any quality.

Often video sources are delivered using many different CODECs. These need to be preprocessed using a CODEC that Davinci Resolve can decode. For the preprocessing, I most often use AviSynth+ scripts. These scripts are rendered using e.g. ffmpeg.

Common alpha channel support

Computer graphics sources commonly have an alpha (transparency) channel. I found that the only CODEC that FFMPEG can export to and DaVinci Resolve and import from is:

  • Apple ProRes

    • Davinci Resolve decode: YUV 4:4:4 (10-bits) with alpha, YUV 4:4:4 (10-bits), YUV 4:2:2 (10-bits)
    • Davinci Resolve encode: none
    • ffmpeg git@2021-01-09 decode: yes, with alpha
    • ffmpeg git@2021-01-09 encode: yes, with alpha
    • ffmpeg options:
      • YUVA 4:4:4 10-bits: -pix_fmt yuva444p10 -c:v prores_ks -profile:v 4444xq prores-yuva444p10.mov
    • FYI the ffmpeg options without alpha:
      • YUV 4:2:2 10-bits: -pix_fmt yuv422p10 -c:v prores_ks -profile:v hq prores-yuv422p10.mov
      • YUV 4:4:4 10-bits: -pix_fmt yuv444p10 -c:v prores_ks -profile:v 4444xq prores-yuv444p10.mov

No common alpha channel support

  • GoPro CineForm

    • Davinci Resolve decode: Native, YUV 10-bit, RGB 16-bit. No alpha.
    • Davinci Resolve encode: YUV 10-bit, RGB 16-bit. No alpha
    • ffmpeg git@2021-01-09 decode: yes, with alpha
    • ffmpeg git@2021-01-09 encode: yes, with alpha
    • ffmpeg options:
      • RGB 12-bits: -pix_fmt gbrp12 -c:v cfhd -quality film3+ cineform-rgbp12.avi
  • DNxHR

    • Davinci Resolve decode: YUV 4:4:4 (10-bits), YUV (4:2:2 8/10-bits). Somehow no alpha support, for all I could detect
    • Davinci Resolve encode: YUV 4:4:4 10/12-bit, YUV 4:2:2 10/12-bit, YUV 4:2:2 8-bit. Alpha in 12-bit except for LB.
    • ffmpeg git@2021-01-09 decode: yes
    • ffmpeg git@2021-01-09 encode: yes, but no 12-bit
    • ffmpeg options:
      • YUV 4:2:2 8-bits: -pix_fmt yuv422p -c:v dnxhd -profile:v dnxhr_hq dnxhr-yuv422p.mov
      • YUV 4:2:2 10-bits: -pix_fmt yuv422p10 -c:v dnxhd -profile:v dnxhr_hqx dnxhr-yuv422p10.mov
      • YUV 4:4:4 10-bits: -pix_fmt yuv444p10 -c:v dnxhd -profile:v dnxhr_444 dnxhr-yuv444p10.mov

Lossy

Lossy compression is great for distributing a completed movie. Lossy compression makes the video file significantly smaller by reducing the quality. Popular lossy CODECs are the Moving Picture Experts Group's H.264 and H.265, and Google's VP9.

No common alpha channel support

  • H.264/AVC, no alpha support in CODEC

    • Davinci Resolve decode: yes (GPU accelerated in Studio)
    • Davinci Resolve encode: yes (GPU accelerated in Studio)
    • ffmpeg git@2021-01-09 decode: yes
    • ffmpeg git@2021-01-09 encode: yes
    • ffmpeg options:
      • YUV 4:2:0 8-bits: -pix_fmt yuv420p -c:v libx264 -preset superfast -tune fastdecode -g 1 -crf 17 h264-yuv420p.mp4
  • H.265

    • Davinci Resolve decode: YUV 4:2:0 (8/10-bits). No alpha support. (GPU accelerated in Studio)
    • Davinci Resolve encode: Studio only (GPU accelerated on Intel)
    • ffmpeg git@2021-01-09 decode: yes, no alpha (yet)
    • ffmpeg git@2021-01-09 encode: yes, no alpha (yet)
    • ffmpeg options:
      • YUV 4:2:0 8-bits: -pix_fmt yuv420p -c:v libx265 -preset superfast -tune fastdecode -g 1 -crf 21 h265-yuv420p.mp4
      • YUV 4:2:0 10-bits: -pix_fmt yuv420p10 -c:v libx265 -preset superfast -tune fastdecode -g 1 -crf 21 h265-yuv420p10.mp4
  • VP9

    • Davinci Resolve decode: YUV 4:2:0 8-bits. No alpha support.
    • Davinci Resolve encode: none
    • ffmpeg git@2021-01-09 decode: yes, with alpha
    • ffmpeg git@2021-01-09 encode: yes, with alpha
    • ffmpeg options:
      • YUVA 4:2:0 8-bits: -pix_fmt yuva420p -c:v vp9 -g 1 -crf 32 vp9-yuva420p.mp4

References

Reduce camcorder noise

Over the past years, camcorders have excelled in capturing video, but the quality of the audio remains lagging. Adding an external microphone is a must, but will only get you so far. Quiet voices will barely rise above the noise floor, while it feels like the videographer yelling in your ear.

This article describes a three stroke approach to improve sound from camcorders. The first step is to reduce the noise. In the following steps the dialog loudness is set to a standard level and the dynamic range is reduced. The last two steps are an integral part of AC-3 (A/52, Dolby Digital) but can also be applied to other streams such as Advanced Audio Codec (AAC) as shown in the last section.

Noise reduction

First things first. Start with a good external microphone and a wind muff. Personally, I use a Canon 5.1 microphone because it is small and mounts on the hot shoe. The specs are not that impressive, but even an ideal microphone will pick up a significant amount of ambient noise.

The noise level is defined as the loudness perceived by the human ear. The loudness is commonly measured using a frequency filter that mimics the human hearing and then measuring the energy as root-mean-square (LAeq). Note that for a sine wave the RMS level is √2 × peak level. Another popular measure is the K-scale for which there are several plugins such as mzuther‘s, DPMP, and meterplugs.

Personally, I take a shortcut by not applying the filter and simply measuring the RMS level of noise samples. For this, I use the stats function from Sound Exchange. In practice this appears to be good enough. My typical indoor measurements are shown in the table below. The values in the table are expressed at decibels below digital full-scale.

Typical indoor levels
Level dBFS
noise -43 dBFS
quiet dialog -41 dBFS
normal dialog -31 dBFS
loud dialog -21 dBFS

With the quiet dialog level this close to the noise floor, we can’t amplify this dialog without also significantly amplifying the noise. As a first step we lower the noise floor in post using an audio editor such as Adobe Sound Booth. This usually reduce the noise by about 8 dB without causing significant distortion.

Reduced noise
Level dBFS
noise -51 dBFS
quiet dialog -41 dBFS
normal dialog -31 dBFS
loud dialog -21 dBFS

Dialog Normalization

The volume dial on AC-3 decoders controls the loudness of a normal dialog. The listener is thus able to reliably set the volume level of the dialogue no matter what program is playing. This does however require the audio to indicate the normal dialog level at which it is recorded. In AC-3 this is accomplished using the metadata tag “dialnorm” as described in the AC-3 specification §5.4.2.8 and §7.6. The decoder uses this parameter to automatically adjust the volume so that the dialog is always played at the same loudness.

Dialog loudness expresses how an average person perceives the volume of a dialog [DD1; DD2]. The film industry has long standardized the normal dialog level at -31 dBFS. The corresponding Sound Pressure Level (SPL) for most movie theaters is 85 dB.

At home we have a volume dial that selects the SPL for normal dialog. Assuming the volume dial is set to a sound pressure level of 67 dB for normal dialog. When this decoder plays a stream with a normal dialog level of -31 dBFS, it will adjust the amplifier gain so that 0 dBFS ≡ 98 dB, what causes the normal dialog to reproduce at -31 dBFS ≡ 67 dB. If the listener switches to another program with a normal dialog at -25 dBFS, the amplifier gain will be reduced so that 0 dBFS ≡ 92 dB, and the normal dialog stays at a 67 dB.

Dynamic Range Compression

The sound from camcorders often has an undesirable large dynamic range. On one end, there may be people whispering in the distance, while on the other end we may have the videographer holding the camcorder. The difference between these loudness levels in called dynamic range. We can reduce the dynamic range by both amplifying the quiet sounds and attenuating the loud sounds. Note that to successfully boost the quiet sounds, it is essential that they are well above the noise floor.

With AC-3 we can adjust the gain using the metadata tag “dynrng” as described in §7.7 of the AC-3 specification. The AC-3 encoder generates these tags based on the loudness combined with a compression profile. All standard profiles have a null-band that is centered around the normal dialog level. Loudness levels within this null-band are left intact. For the film compression profile at a normal dialog level of -31 dBFS, the transfer function can be visualized as shown below.

As shown in the graph above, signals from -33.5 to -28.5 dBFS stay the same. The 12 dB below is boosted, while the 10 dB above is attenuated as also shown in the table below.

Dynamic range compression

This film at dialnorm -31dBFS profile is a good fit for my camcorder recordings. The quiet dialog is more or less in the middle of the boost range, the normal dialog is at the normal dialog level, while the loud dialog is in the early cut range.

Noise reduction and range compression
source material noise reduced range compressed
noise -43 dBFS -51 dBFS -45 dBFS
quiet dialog -41 dBFS -41 dBFS -37 dBFS
normal dialog -31 dBFS -31 dBFS -31 dBFS
loud dialog -21 dBFS -21 dBFS -25 dBFS

The table above shows that the soft dialog is amplified by 4 dB, but at the cost of raising the noise level 2 dB. If we had a constant background noise level, we could consider alter the transfer function so that it attenuate all signals below -45.5 dBFS.

Let’s do it

As shown above, applying dialog normalization and dynamic range is straightforward using the metadata in AC-3. For other audio streams, we will need to alter the samples to get the same effect.

AC-3

For AC-3, we only need to extract the audio stream using FFmpeg and compress it using Aften.

ffmpeg -y -vn -i in.avi out.wav
aften -v 0 -dnorm 31 -dynrng 1 out.wav out.ac3

AAC

The Advanced Audio Codec (AAC) doesn’t support the metadata for dialog normalization or dynamic range compression. Instead, we modify the samples directly using SoX as shown below.

Note that in this particular example it also applies a 13 dB overall gain. The values printed in bold represent the compression curve. The resulting stream is then compressed using Nero’s AAC encoder.

ffmpeg -y -vn -i in.avi tmp.wav
    set CURVE="-90.0,-84.0,-45.5,-39.5,-33.5,-33.5,-28.5,-28.5,-18.5,-23.5,0.0,-22.6"
    sox tmp.wav out.wav compand 0.1,3.0 %CURVE% 13.0 -90 1.6 stats neroaacenc -lc -br 226000 -if out.wav -of out.aac

Those who like to experiment themselves find the transfer curves for other profiles at various dialog levels.

  • film light
    -31:   -90.0,-84.0, -53.0,-47.0, -41.0,-41.0, -21.0,-21.0, -11.0,-16.0, 0.0,-15.5
        -28.5: -90.0,-84.0, -50.5,-44.5, -38.5,-38.5, -18.5,-18.5,  -8.5,-13.5, 0.0,-13.1
        -27:   -90.0,-84.0, -49.0,-43.0, -37.0,-37.0, -17.0,-17.0,  -7.0,-12.0, 0.0,-11.7
  • film
    -31:   -90.0,-84.0, -45.5,-39.5, -33.5,-33.5, -28.5,-28.5, -18.5,-23.5, 0.0,-22.6
        -28.5: -90.0,-84.0, -43.0,-37.0, -31.0,-31.0, -26.0,-26.0, -16.0,-21.0, 0.0,-20.2
        -27:   -90.0,-84.0, -41.5,-35.5, -29.5,-29.5, -24.5,-24.5, -14.5,-19.5, 0.0,-18.8
  • speech
    -31:   -90.0,-74.8, -52.5,-37.3, -33.5,-33.5, -28.5,-28.5, -18.5,-23.5, 0.0,-22.6
        -28.5: -90.0,-74.8, -50.0,-34.8, -31.0,-31.0, -26.0,-26.0, -16.0,-21.0, 0.0,-20.2
        -27:   -90.0,-74.8, -48.5,-33.3, -29.5,-29.5, -24.5,-24.5, -14.5,-19.5, 0.0,-18.8
  • music light
    -31:   -90.0,-78.0, -65.0,-53.0, -41.0,-41.0, -21.0,-21.0, -21.0,-21.0, 0.0,-20.0
        -28.5: -90.0,-78.0, -62.5,-50.5, -38.5,-38.5, -18.5,-18.5, -18.5,-18.5, 0.0,-17.6
        -27:   -90.0,-78.0, -61.0,-49.0, -37.0,-37.0, -17.0,-17.0, -17.0,-17.0, 0.0,-16.2
  • music
    -31:   -90.0,-78.0, -57.5,-45.5, -33.5,-33.5, -28.5,-28.5, -18.5,-23.5, 0.0,-22.6
        -28.5: -90.0,-78.0, -55.0,-43.0, -31.0,-31.0, -26.0,-26.0, -16.0,-21.0, 0.0,-20.2
        -27:   -90.0,-78.0, -53.5,-41.5, -29.5,-29.5, -24.5,-24.5, -14.5,-19.5, 0.0,-18.8

Copyright © 1996-2022 Coert Vonk, All Rights Reserved