Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unsupervised Model Selection for Variational Disentangled Representation Learning

Authors: Sunny Duan, Loic Matthey, Andre Saraiva, Nick Watters, Chris Burgess, Alexander Lerchner, Irina Higgins

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our approach performs comparably to the existing supervised alternatives across 5400 models from six state of the art unsupervised disentangled representation learning model classes. Furthermore, we show that the ranking produced by our approach correlates well with the final task performance on two different domains.
Researcher Affiliation Industry Sunny Duan Deep Mind EMAIL Loic Matthey Deep Mind EMAIL Andre Saraiva Deep Mind EMAIL Nick Watters Deep Mind EMAIL Chris Burgess Deep Mind EMAIL Alexander Lerchner Deep Mind EMAIL Irina Higgins Deep Mind EMAIL
Pseudocode No The paper describes the UDR method in four steps within the text and refers to Figure 4 as a 'Schematic illustration of the UDR method'. This figure is a diagram, not pseudocode or an algorithm block.
Open Source Code Yes We have released the code for our method as part of disentanglement_lib
Open Datasets Yes We validate our proposed method on two datasets with fully known generative processes commonly used to evaluate the quality of disentangled representations: d Sprites (Matthey et al., 2017) and 3D Shapes (Burgess & Kim, 2018)...d Sprites A commonly used unit test for evaluating disentangling is the d Sprites dataset (Matthey et al., 2017). ... 3D Shapes A more complex dataset for evaluating disentangling is the 3D Shapes dataset (Burgess & Kim, 2018).
Dataset Splits No The paper references using 'trained model checkpoints and supervised scores from Locatello et al. (2018)' and describes dataset splits for metric calculations (e.g., 'test set of 5000' for Ξ²-VAE metric), but it does not explicitly provide the training, validation, and test splits for the primary VAE model training.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions software like 'Scikit-learn' for Lasso regressors and logistic regression, and 'Adam' as an optimizer, but it does not provide specific version numbers for any of these software components.
Experiment Setup Yes Each model is trained with H =6 different hyperparameter settings (detailed in Sec. A.4.1 in Supplementary Material), with S =50 seeds per setting, and P =50 pairwise comparisons. ... Table 5: Hyperparameters used for each model architecture ... Table 6: Miscellaneous model details (Batch Size 64, Latent space dimension 10, Optimizer Adam, Adam: beta1 0.9, Adam: beta2 0.999, Adam: epsilon 1e-8, Adam: learning rate 0.0001, Decoder type Bernoulli).