Generating Videos with Scene Dynamics

Authors: Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images. Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision...
Researcher Affiliation Academia Carl Vondrick MIT vondrick@mit.edu Hamed Pirsiavash UMBC hpirsiav@umbc.edu Antonio Torralba MIT torralba@mit.edu
Pseudocode No The paper describes network architectures and learning procedures but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'Our implementation is based off a modified version of [31] in Torch7.' but does not explicitly provide a link or statement about releasing the source code for their own methodology.
Open Datasets Yes We downloaded over two million videos from Flickr [39] by querying for popular Flickr tags as well as querying for common English words. ...Action Classification: We evaluated performance on classifying actions on UCF101 [35].
Dataset Splits No The paper mentions 'train/test splits' for UCF101 in Figure 4a's caption but does not specify the percentages or sample counts for training, validation, and test sets. It also mentions 'batch normalization' but this is a technique, not a data split.
Hardware Specification No The paper mentions 'Training typically took several days on a GPU' and 'NVidia donated GPUs used for this research' but does not specify the model or detailed specifications of the GPUs or other hardware used.
Software Dependencies No The paper states 'Our implementation is based off a modified version of [31] in Torch7.' but does not provide a specific version number for Torch7 or any other software dependencies.
Experiment Setup Yes We use the Adam [16] optimizer and a fixed learning rate of 0.0002 and momentum term of 0.5. The latent code has 100 dimensions, which we sample from a normal distribution. We use a batch size of 64. We initialize all weights with zero mean Gaussian noise with standard deviation 0.01. We normalize all videos to be in the range [ 1, 1].