reproducibilityindex.ai

Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Authors: Shir Gur, Sagie Benaim, Lior Wolf

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the proposed method produces diverse samples in both the image domain, and the more challenging video domain. Our experiments show that the novel method outperforms previous work, where we compare to (i) current video generation models, which can only be trained on multiple video samples, and (ii) recent image generation methods trained on a single image sample, and (iii) the extension of these methods to video, which, does not replicate their success in image generation. Datasets To compare to video generation methods, we use the UCF-101 dataset [35].
Researcher Affiliation	Academia	Shir Gur , Sagie Benaim , Lior Wolf The School of Computer Science, Tel Aviv University
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and supplementary material (SM) with additional samples are available at https://shirgur.github.io/hp-vae-gan.
Open Datasets	Yes	Datasets To compare to video generation methods, we use the UCF-101 dataset [35], which contains over 13K videos of 101 different sport categories. For single sample video experiments, we choose 25 high quality video samples from the You Tube 8M dataset [36]. For a single sample image experiment, 25 images were randomly selected from Sin GAN s training samples3.
Dataset Splits	Yes	We randomly sample 50 generated videos and for each sample s, ﬁnd their 1st and 2nd nearest neighbors (NN) in the UCF-101 training set, denoted nn1 nad nn2. Using the 50 generated samples s or s we also computed an FID score against UCF-101 test set.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	Unless otherwise stated, we set M = 3 and N = 9 and use the architecture described in the SM. We set Ωto {1, 2, 3, 4} in our experiments. The different scales of the decoder are trained sequentially, such that the encoder and the top part of the decoder are trained at each step with a reconstruction term. This scale minimizes the loss Lvae(x0) as deﬁned in Eq. 5 and is trained for a ﬁxed number of epochs.