Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample
Authors: Shir Gur, Sagie Benaim, Lior Wolf
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed method produces diverse samples in both the image domain, and the more challenging video domain. Our experiments show that the novel method outperforms previous work, where we compare to (i) current video generation models, which can only be trained on multiple video samples, and (ii) recent image generation methods trained on a single image sample, and (iii) the extension of these methods to video, which, does not replicate their success in image generation. Datasets To compare to video generation methods, we use the UCF-101 dataset [35]. |
| Researcher Affiliation | Academia | Shir Gur , Sagie Benaim , Lior Wolf The School of Computer Science, Tel Aviv University |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and supplementary material (SM) with additional samples are available at https://shirgur.github.io/hp-vae-gan. |
| Open Datasets | Yes | Datasets To compare to video generation methods, we use the UCF-101 dataset [35], which contains over 13K videos of 101 different sport categories. For single sample video experiments, we choose 25 high quality video samples from the You Tube 8M dataset [36]. For a single sample image experiment, 25 images were randomly selected from Sin GAN s training samples3. |
| Dataset Splits | Yes | We randomly sample 50 generated videos and for each sample s, find their 1st and 2nd nearest neighbors (NN) in the UCF-101 training set, denoted nn1 nad nn2. Using the 50 generated samples s or s we also computed an FID score against UCF-101 test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | Unless otherwise stated, we set M = 3 and N = 9 and use the architecture described in the SM. We set Ωto {1, 2, 3, 4} in our experiments. The different scales of the decoder are trained sequentially, such that the encoder and the top part of the decoder are trained at each step with a reconstruction term. This scale minimizes the loss Lvae(x0) as defined in Eq. 5 and is trained for a fixed number of epochs. |