reproducibilityindex.ai

ControlVAE: Controllable Variational Autoencoder

Authors: Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin Liu, Jun Wang, Tarek Abdelzaher

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The framework is evaluated using three applications; namely, language modeling, disentangled representation learning, and image generation. The results show that Control VAE can achieve much better reconstruction quality than the competitive methods for the comparable disentanglement performance. For language modeling, it not only averts the KL-vanishing, but also improves the diversity of generated text. Finally, we also demonstrate that Control VAE improves the reconstruction quality for image generation compared to the original VAE.
Researcher Affiliation	Collaboration	1Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA. 2AWS Deep Learning, CA, USA. 3Alibaba Group, Seattle, WA.
Pseudocode	Yes	We summarize the proposed PI control algorithm in Algorithm 1. Our PI algorithm updates the hyperparameter, β(t), with the feedback from sampled KL-divergence at training step t.
Open Source Code	Yes	Source code is publicly available at https://github. com/shj1987/Control VAE-ICML2020.git
Open Datasets	Yes	Language modeling: 1) Penn Tree Bank (PTB) (Marcus et al., 1993): it consists of 42, 068 training sentences, 3, 370 validation sentences and 3, 761 testing sentences. 2) Switchboard(SW) (Godfrey & Holliman, 1997): it has 2400 two-sided telephone conversations with manually transcribed speech and alignment. The data is randomly split into 2316, 60 and 62 dialog for training, validation and testing. Disentangling: 1) 2D Shapes (Matthey et al., 2017): it has 737, 280 binary 64 64 images of 2D shapes with ﬁve ground truth factors (number of values): shape(3), scale(6), orientation(40), x-position(32), yposition(32) (Kim & Mnih, 2018). Image generation: 1) Celeb A(cropped version) (Liu et al., 2015): It has 202, 599 RGB 128 128 3 images of celebrity faces. The data is split into 192, 599 and 10, 000 images for training and testing.
Dataset Splits	Yes	Language modeling: 1) Penn Tree Bank (PTB) (Marcus et al., 1993): it consists of 42, 068 training sentences, 3, 370 validation sentences and 3, 761 testing sentences. 2) Switchboard(SW) (Godfrey & Holliman, 1997): it has 2400 two-sided telephone conversations with manually transcribed speech and alignment. The data is randomly split into 2316, 60 and 62 dialog for training, validation and testing.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies	No	The paper mentions using Transformer as the decoder but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	The detailed model configurations and hyperparameter settings for each model is presented in Appendix A. Specifically for Language modeling: Following PI tuning strategy in Section 3.1, we set Kp, Ki of the PI algorithm in (6) to 0.01 and 0.0001, respectively. In addition, βmin is set to 0 and the maximum value of β(t) is limited to 1. For disentangling: Since β(t) > 1, we set βmin to 1 for the PI algorithm in (6). Following the PI tuning method above, the coefficients Kp and Ki are set to 0.01 and 0.001, respectively. For image generation: For this task, we use the same PI control algorithm and hyperparameters as the above language modeling.