LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Authors: Duolikun Danier, Fan Zhang, David Bull

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our quantitative experiments and user study indicate that LDMVFI is able to interpolate video content with favorable perceptual quality compared to the state of the art, even in the high-resolution regime.
Researcher Affiliation Academia Duolikun Danier, Fan Zhang, David Bull University of Bristol {duolikun.danier, fan.zhang, dave.bull}@bristol.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes Our code is available at https://github.com/danier97/LDMVFI.
Open Datasets Yes We utilize the most commonly used training set in VFI, Vimeo90k (Xue et al. 2019). To better test the learning capability and performance of VFI methods on a wider range of scenarios, we follow (Danier, Zhang, and Bull 2022c) to additionally incorporate samples from the BVI-DVC dataset (Ma, Zhang, and Bull 2021).
Dataset Splits No The paper describes the 'Training Dataset' and 'Test Datasets' and mentions that 'All models were trained until convergence', but it does not specify explicit validation dataset splits (e.g., percentages, sample counts, or a separate validation dataset).
Hardware Specification Yes NVIDIA RTX 3090 GPUs were used for all training and evaluation.
Software Dependencies No The paper mentions optimizers like ADAM and Adam-W, but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes We set the downsampling factor of VQ-FIGAN to f = 32... The size of the kernels output by the decoder is K = 5. Regarding the diffusion processes, following (Rombach et al. 2022), we adopt a linear noise schedule and a codebook size of 8192 for vector quantization in VQ-FIGAN. We sample from all diffusion models with the DDIM (Song, Meng, and Ermon 2021) sampler for 200 steps (details provided in the Supplementary). We also follow (Rombach et al. 2022) to train the VQ-FIGAN using the ADAM (Kingma and Ba 2015) optimizer and the denoising U-Net using the Adam-W optimizer (Loshchilov and Hutter 2019), with the initial learning rates set to 10 5 and 10 6 respectively. All models were trained until convergence, which corresponds to around 70 epochs for VQ-FIGAN, and around 60 epochs for the U-Net.