A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

Authors: Marco Fraccaro, Simon Kamronn, Ulrich Paquet, Ole Winther

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The model is trained end-to-end on videos of a variety of simulated physical systems, and outperforms competing methods in generative and missing data imputation tasks. and KVAEs are tested on videos of a variety of simulated physical systems in section 5.
Researcher Affiliation Collaboration Technical University of Denmark Deep Mind
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Further implementation details can be found in the supplementary material (appendix A) and in the Tensorflow [1] code released at github.com/simonkamronn/kvae.
Open Datasets No The paper uses simulated data that it generates itself: 'We simulate 5000 sequences of 20 time steps each of a ball moving in a two-dimensional box, where each video frame is a 32x32 binary image.' and 'Training, validation and test set are formed by 500 sequences of 15 frames of 16x16 pixels.' No concrete access information (link, DOI, formal citation for a public dataset) is provided.
Dataset Splits No The paper mentions 'Training, validation and test set are formed by 500 sequences of 15 frames of 16x16 pixels.' but does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) to reproduce the data partitioning.
Hardware Specification Yes We thank NVIDIA Corporation for the donation of TITAN X GPUs.
Software Dependencies No The paper mentions 'Tensorflow [1]' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The minimum number of latent dimensions that the KVAE requires to model the ball s dynamics are at R2 and zt R4, as at the very least the ball s position in the box s 2d plane has to be encoded in at, and zt has to encode the ball s position and velocity. The dynamics parameter network uses K = 3 to interpolate three modes... and We use a KVAE with at R2, zt R3 and K = 2