Disentangled Recurrent Wasserstein Autoencoder

Authors: Jun Han, Martin Renqiang Min, Ligong Han, Li Erran Li, Xuan Zhang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a variety of datasets show that our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation both quantitatively and qualitatively.
Researcher Affiliation Collaboration Jun Han PCG, Tencent junhanjh@tencent.com Martin Renqiang Min NEC Laboratories America renqiang@nec-labs.com Ligong Han Rutgers University hanligong@gmail.com Li Erran Li Alexa AI, Amazon erranlli@gmail.com Xuan Zhang Texas A&M University floatlazer@gmail.com
Pseudocode Yes Algorithm 1 R-WAE(GAN) and Algorithm 2 R-WAE(MMD) are explicitly provided in Appendix D.
Open Source Code No The paper does not include an explicit statement or link confirming the release of its source code for the described methodology.
Open Datasets Yes We train our models on Stochastic moving MNIST (SM-MNIST), Sprites, and TIMIT datasets under a completely unsupervised setting. The number of actions (motions) is used as prior information for all methods on MUG facial dataset.
Dataset Splits Yes We use 6 variants in each of 4 attribute categories (skin colors, tops, pants and hair style) and there are 64 = 1296 unique characters in total, where 1000 of them are used for training and the rest of them are used for testing. (Sprites Dataset)
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., specific GPU or CPU models).
Software Dependencies No The paper mentions using 'Adam optimizer (Kingma & Ba, 2015)' but does not specify version numbers for any software dependencies, libraries, or programming languages.
Experiment Setup Yes The penalty coefficients β1 and β2, are, respectively, 5 and 20. The learning rate for the decoder model is 5 10 4 and the learning rate for the encoder is 1 10 4. The learning rate for fγ is 1 10 4. The batch size on both SM-MNIST and Sprites dataset are 60 and the length of video sequence for training is T = 8. (Appendix G)