Stochastic Variational Video Prediction

Authors: Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate SV2P on multiple real-world video datasets, as well as a carefully designed toy dataset that highlights the importance of stochasticity in video prediction (see Figure 1). In both our qualitative and quantitative comparisons, SV2P produces substantially improved video predictions when compared to the same model without stochasticity, with respect to standard metrics such as PSNR and SSIM.
Researcher Affiliation Collaboration Mohammad Babaeizadeh1, Chelsea Finn2, Dumitru Erhan3, Roy Campbell1, and Sergey Levine2,3 1University of Illinois at Urbana-Champaign 2University of California, Berkeley 3Google Brain
Pseudocode No No pseudocode or clearly labeled algorithm block found.
Open Source Code No Our SV2P implementation will be open sourced upon publication. The Tensor Flow (Abadi et al., 2016) implementation of this project will be open sourced upon publication.
Open Datasets Yes BAIR robot pushing dataset (Ebert et al., 2017), Human3.6M (Ionescu et al., 2014), Robotic pushing prediction (Finn et al., 2016)
Dataset Splits No The paper discusses training and test datasets but does not explicitly provide details on a separate validation dataset split for reproducibility.
Hardware Specification No Reed et al. (2017) proposed a parallelized multi-scale algorithm that significantly improves the training and prediction time but still requires more than a minute to generate one second of 64 64 video on a GPU.
Software Dependencies No The Tensor Flow (Abadi et al., 2016) implementation of this project will be open sourced upon publication.
Experiment Setup Yes Table 1: Hyper-parameters used for experiments. Generative Network model type CDNA batch size 16 learning rate 0.001 scheduled sampling (k) 900.0 # of masks 10 # of iterations 200000 Inference Network latent minimum σ -5.0 starting β 0.0 final β 0.001 # of latent channels 1 # step 1 iterations 50000 # step 2 iterations 50000 # step 3 iterations 100000 Optimization Method ADAM β1 0.9 β2 0.999 ϵ 1e-8