Stochastic Video Generation with a Learned Prior
Authors: Emily Denton, Rob Fergus
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our SVG-FP and SVG-LP model on one synthetic video dataset (Stochastic Moving MNIST) and two real ones (KTH actions (Schuldt et al., 2004) and BAIR robot (Ebert et al., 2017)). We show quantitative comparisons by computing structural similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) scores between ground truth and generated video sequences. |
| Researcher Affiliation | Collaboration | Emily Denton 1 Rob Fergus 1 2 1New York University 2Facebook AI Research. |
| Pseudocode | Yes | For a time step t during training, the generation is as follows, where the LSTM recurrence is omitted for brevity: µφ(t), σφ(t) = LSTMφ(ht) ht = Enc(xt) zt N(µφ(t), σφ(t)) gt = LSTMθ(ht 1, zt) ht 1 = Enc(xt 1) µθ(t) = Dec(gt) |
| Open Source Code | Yes | Source code and trained models are available at https://github.com/ edenton/svg. |
| Open Datasets | Yes | We evaluate our SVG-FP and SVG-LP model on one synthetic video dataset (Stochastic Moving MNIST) and two real ones (KTH actions (Schuldt et al., 2004) and BAIR robot (Ebert et al., 2017)). |
| Dataset Splits | No | The paper mentions training on datasets and evaluating on 'unseen test videos' and 'held out test sequences', but does not specify a separate validation dataset split or its size/proportion for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions the use of the ADAM optimizer and various network architectures (DCGAN, VGG16), but does not provide specific software version numbers for libraries or frameworks used (e.g., TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We train all the models with the ADAM optimizer (Kingma & Ba, 2014) and learning rate η = 0.002. We set β = 1e-4 for KTH and BAIR and β = 1e-6 for KTH. |