Disentangled Recurrent Wasserstein Autoencoder
Authors: Jun Han, Martin Renqiang Min, Ligong Han, Li Erran Li, Xuan Zhang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a variety of datasets show that our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation both quantitatively and qualitatively. |
| Researcher Affiliation | Collaboration | Jun Han PCG, Tencent junhanjh@tencent.com Martin Renqiang Min NEC Laboratories America renqiang@nec-labs.com Ligong Han Rutgers University hanligong@gmail.com Li Erran Li Alexa AI, Amazon erranlli@gmail.com Xuan Zhang Texas A&M University floatlazer@gmail.com |
| Pseudocode | Yes | Algorithm 1 R-WAE(GAN) and Algorithm 2 R-WAE(MMD) are explicitly provided in Appendix D. |
| Open Source Code | No | The paper does not include an explicit statement or link confirming the release of its source code for the described methodology. |
| Open Datasets | Yes | We train our models on Stochastic moving MNIST (SM-MNIST), Sprites, and TIMIT datasets under a completely unsupervised setting. The number of actions (motions) is used as prior information for all methods on MUG facial dataset. |
| Dataset Splits | Yes | We use 6 variants in each of 4 attribute categories (skin colors, tops, pants and hair style) and there are 64 = 1296 unique characters in total, where 1000 of them are used for training and the rest of them are used for testing. (Sprites Dataset) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., specific GPU or CPU models). |
| Software Dependencies | No | The paper mentions using 'Adam optimizer (Kingma & Ba, 2015)' but does not specify version numbers for any software dependencies, libraries, or programming languages. |
| Experiment Setup | Yes | The penalty coefficients β1 and β2, are, respectively, 5 and 20. The learning rate for the decoder model is 5 10 4 and the learning rate for the encoder is 1 10 4. The learning rate for fγ is 1 10 4. The batch size on both SM-MNIST and Sprites dataset are 60 and the length of video sequence for training is T = 8. (Appendix G) |