Weakly Supervised Disentanglement with Guarantees

Authors: Rui Shu, Yining Chen, Abhishek Kumar, Stefano Ermon, Ben Poole

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this issue, we provide a theoretical framework to assist in analyzing the disentanglement guarantees (or lack thereof) conferred by weak supervision when coupled with learning algorithms based on distribution matching. We empirically verify the guarantees and limitations of several weak supervision methods (restricted labeling, match-pairing, and rank-pairing), demonstrating the predictive power and usefulness of our theoretical framework.
Researcher Affiliation Collaboration Rui Shu , Yining Chen , Abhishek Kumar , Stefano Ermon & Ben Poole Stanford University, Google Brain {ruishu,cynnjjs,ermon}@stanford.edu {abhishk,pooleb}@google.com
Pseudocode No The paper does not contain any blocks explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code Yes Code available at https://github.com/google-research/google-research/tree/master/weak disentangle
Open Datasets Yes We conducted experiments on five prominent datasets in the disentanglement literature: Shapes3D (Kim & Mnih, 2018), d Sprites (Higgins et al., 2017), Scream-d Sprites (Locatello et al., 2019), Small NORB (Le Cun et al., 2004), and Cars3D (Reed et al., 2015).
Dataset Splits No The paper mentions using several datasets for experiments but does not explicitly provide details about training, validation, or test splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not specify any particular hardware components such as GPU models, CPU types, or cloud computing instance details used for running the experiments.
Software Dependencies No The paper mentions using "PyTorch" and "Keras" for initialization schemes but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For all models, we use the Adam optimizer with β1 = 0.5, β2 = 0.999 and set the generator learning rate to 1e-3. We use a batch size of 64 and set the leaky Re LU negative slope to 0.2. Our results are collected over a broad range of hyperparameter configurations (see Appendix H for details).