Diverse Video Generation using a Gaussian Process Trigger

Authors: Gaurav Shrivastava, Abhinav Shrivastava

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences. Webpage http://www.cs.umd.edu/ gauravsh/dvg.html
Researcher Affiliation Academia Gaurav Shrivastava and Abhinav Shrivastava University of Maryland, College Park {gauravsh,abhinav}@cs.umd.edu
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper provides a webpage URL (http://www.cs.umd.edu/~gauravsh/dvg.html) in the abstract, but does not explicitly state that this page contains the source code for the methodology or provide a direct link to a code repository.
Open Datasets Yes KTH Action Recognition Dataset. The KTH action dataset (Schuldt et al., 2004) [...]; BAIR pushing Dataset. The BAIR robot pushing dataset (Ebert et al., 2017) [...]; Human3.6M Dataset. Human3.6M (Ionescu et al., 2014) [...]; UCF Dataset. This dataset (Soomro et al., 2012) [...]
Dataset Splits No The paper states, "All models use 5 frames as context (past) during training and learn to predict the next 10 frames." and describes evaluation procedures like using "500 starting sequences" and "50 future sequences", but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or exact counts for each split) from the overall datasets.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper mentions using "GPytorch" and "Adam optimizer" and "I3D action recognition classifier" but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup Yes All our models are trained using Adam optimizer. All models use 5 frames as context (past) during training and learn to predict the next 10 frames. [...] For the deterministic switch, we do not use the variance of the GP as a trigger, and switch every 15 frames. [...] For the GP trigger switch, we compare the current GP variance with the mean of the variance of the last 10 states. If the current variance is larger than two standard deviations, we trigger a switch. [...] We trained all models on 64 × 64-size frames from the KTH, Human3.6M, and BAIR datasets. [...] For variational GP implementation, 40 inducing points were randomly initialized and learned during the training of GP.