Unsupervised Learning of Disentangled Representations from Video
Authors: Emily L. Denton, vighnesh Birodkar
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on a range of synthetic and real videos, demonstrating the ability to coherently generate hundreds of steps into the future. We evaluate our model on both synthetic (MNIST, NORB, SUNCG) and real (KTH Actions) video datasets. We explore several tasks with our model: (i) the ability to cleanly factorize into content and pose components; (ii) forward prediction of video frames using the approach from Section 3.1; (iii) using the pose/content features for classification tasks. In Fig. 9 we attempt to quantify the fidelity of the generations by comparing our approach to MCNet [33] using a metric derived from the Inception score [26]. |
| Researcher Affiliation | Academia | Emily Denton Department of Computer Science New York University denton@cs.nyu.edu Vighnesh Birodkar Department of Computer Science New York University vighneshbirodkar@nyu.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. The paper describes the model and loss functions using mathematical equations and diagrams, but not in a pseudocode format. |
| Open Source Code | Yes | Source code is available at https://github.com/edenton/drnet. |
| Open Datasets | Yes | We evaluate our model on both synthetic (MNIST, NORB, SUNCG) and real (KTH Actions) video datasets. |
| Dataset Splits | No | On MNIST, we train the model by observing 5 frames and predicting 10 frames. On KTH, we train the model by observing 10 frames and predicting 10 frames. The paper mentions training on MNIST, NORB, SUNCG, KTH, but does not provide specific train/validation/test dataset splits (e.g., 80/10/10 percentage or counts). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) were mentioned for running experiments. The paper describes model architectures but not the computational resources. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. The paper refers to optimizers (ADAM [13]) and model architectures (DCGAN [21], ResNet-18 [9], VGG16 [29]) but not specific software platforms or libraries with their versions. |
| Experiment Setup | Yes | We trained all our models with the ADAM optimizer [13] and learning rate η = 0.002. We used β = 0.1 for MNIST, NORB and SUNCG and β = 0.0001 for KTH experiments. We used α = 1 for all datasets. For future prediction experiments we train a two layer LSTM with 256 cells using the ADAM optimizer. |