reproducibilityindex.ai

VIDM: Video Implicit Diffusion Models

Authors: Kangfu Mei, Vishal Patel

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial networkbased methods by a significant margin in terms of FVD scores as well as perceptible visual quality. The effectiveness of the proposed model is demonstrated on various datasets by comparing the performance with several state-of-the-art works. We present the main quantitative results comparison in Table 1 and Table 2, and the main qualitative results comparison is Figure 3.
Researcher Affiliation	Academia	Johns Hopkins University
Pseudocode	No	The paper provides mathematical formulations for its methods, such as equations for learning objectives and transformations. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper lists a project page URL (https://kfmei.page/vidm/) on the first page. However, it does not include an explicit statement confirming the release of source code at this URL, nor does the URL directly link to a source-code repository.
Open Datasets	Yes	The experiments are conducted on UCF-101 (Soomro, Zamir, and Shah 2012), Tai Chi-HD (Siarohin et al. 2019), Sky Time-lapse (Xiong et al. 2018), and CLEVRER (Yi et al. 2020).
Dataset Splits	No	The paper states, 'All evaluation is conducted on 2048 randomly selected real and generated videos for reducing variance.' This describes the data used for evaluation, but it does not specify the train/validation/test splits of the primary datasets (UCF-101, Tai Chi-HD, etc.) used during model training.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper mentions several software components and architectures like 'U-Net', 'Multi-Head Attention', 'Group Norm', 'Pixel CNN++', and 'Spy Net'. However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	The diffusion network architecture of our method is an autoencoder network that follows the design of Pixel CNN++ (Salimans et al. 2017). We apply multiple multi-head attention modules (Vaswani et al. 2017) at features in a resolution of 16 16 for capturing longrange dependence that benefits the perceptual quality. For robustness penalty, 'η is a constant that is experimentally set as 1e 8'.