reproducibilityindex.ai

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo HU, Ying Shan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted to demonstrate the effectiveness of the proposed video VAE. 4 Experiments
Researcher Affiliation	Industry	Tencent AI Lab
Pseudocode	No	Figure 7 shows the "Architecture of CV-VAE" which is a diagram, not pseudocode or an algorithm block.
Open Source Code	No	https://github.com/AILab-CVC/CV-VAE. Code and checkpoints will be released upon the acceptance of this paper.
Open Datasets	Yes	We train our CV-VAE model using image datasets including LAION-COCO [9] and Unsplash [23], as well as the video dataset Webvid-10M [3].
Dataset Splits	Yes	We evaluate our CV-VAE on the COCO2017 [21] validation dataset and the Webvid [3] validation dataset which includes 1024 videos.
Hardware Specification	Yes	To avoid numerical overflow, we trained CV-VAE using float32 precision, and the training was carried out on 16 A100 GPUs for 200K steps. ... The training was carried out on 16 A100 GPUs for 5K steps.
Software Dependencies	No	The paper mentions employing the Adam W optimizer, deepspeed stage 2, gradient checkpointing techniques, and training with float32/bfloat16 precision, but does not provide specific version numbers for software libraries like PyTorch or TensorFlow.
Experiment Setup	Yes	For image datasets, we employ two resolutions, i.e., 256 256 and 512 512. In the case of video datasets, we use two settings of frames and resolutions: 9 256 256 and 17 192 192. The batch sizes for these four settings are 8, 2, 1, and 1, with sampling ratios of 40%, 10%, 25%, and 25%, respectively. We employed the Adam W optimizer [22] with a learning rate of 1e-4 and cosine learning rate decay.