Stochastic Variational Video Prediction
Authors: Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SV2P on multiple real-world video datasets, as well as a carefully designed toy dataset that highlights the importance of stochasticity in video prediction (see Figure 1). In both our qualitative and quantitative comparisons, SV2P produces substantially improved video predictions when compared to the same model without stochasticity, with respect to standard metrics such as PSNR and SSIM. |
| Researcher Affiliation | Collaboration | Mohammad Babaeizadeh1, Chelsea Finn2, Dumitru Erhan3, Roy Campbell1, and Sergey Levine2,3 1University of Illinois at Urbana-Champaign 2University of California, Berkeley 3Google Brain |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block found. |
| Open Source Code | No | Our SV2P implementation will be open sourced upon publication. The Tensor Flow (Abadi et al., 2016) implementation of this project will be open sourced upon publication. |
| Open Datasets | Yes | BAIR robot pushing dataset (Ebert et al., 2017), Human3.6M (Ionescu et al., 2014), Robotic pushing prediction (Finn et al., 2016) |
| Dataset Splits | No | The paper discusses training and test datasets but does not explicitly provide details on a separate validation dataset split for reproducibility. |
| Hardware Specification | No | Reed et al. (2017) proposed a parallelized multi-scale algorithm that significantly improves the training and prediction time but still requires more than a minute to generate one second of 64 64 video on a GPU. |
| Software Dependencies | No | The Tensor Flow (Abadi et al., 2016) implementation of this project will be open sourced upon publication. |
| Experiment Setup | Yes | Table 1: Hyper-parameters used for experiments. Generative Network model type CDNA batch size 16 learning rate 0.001 scheduled sampling (k) 900.0 # of masks 10 # of iterations 200000 Inference Network latent minimum σ -5.0 starting β 0.0 final β 0.001 # of latent channels 1 # step 1 iterations 50000 # step 2 iterations 50000 # step 3 iterations 100000 Optimization Method ADAM β1 0.9 β2 0.999 ϵ 1e-8 |