On the Transfer of Disentangled Representations in Realistic Settings

Authors: Andrea Dittadi, Frederik Träuble, Francesco Locatello, Manuel Wuthrich, Vaibhav Agrawal, Ole Winther, Stefan Bauer, Bernhard Schölkopf

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset.
Researcher Affiliation Academia 1Technical University of Denmark, 2Max Planck Institute for Intelligent Systems, 3ETH Zurich, Department for Computer Science, 4Copenhagen University Hospital, 5University of Copenhagen , 6CIFAR Azrieli Global Scholar
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states that the *datasets* are made publicly available with a URL, but it does not provide an explicit statement or link for the *source code* of the methodology described in the paper.
Open Datasets Yes We propose a dataset consisting of simulated observations from a scene where a robotic arm interacts with a cube in a stage (see Fig. 1). ... Additionally, we recorded an annotated dataset under the same conditions in the real-world setup: we acquired 1,809 camera images from the same viewpoint and recorded the labels of the 7 underlying factors of variation. ... These datasets are made publicly available.1 http://people.tuebingen.mpg.de/ei-datasets/iclr_transfer_paper/robot_ finger_datasets.tar (6.18 GB)
Dataset Splits No The paper specifies training and testing set sizes for downstream tasks (10k and 5k images respectively) but does not explicitly describe a separate dataset split for validation of the main models or for general model development. It mentions 'validation' in the context of evaluation metrics but not as a dataset partition.
Hardware Specification Yes Training these models requires approximately 2.8 GPU years on NVIDIA Tesla V100 PCIe.
Software Dependencies No The paper mentions using the 'Adam optimizer' (Kingma & Ba, 2014) but does not provide specific version numbers for any software, libraries, or programming languages used in the implementation or experiments (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes The hyperparameter sweep is defined as follows: We train the models using either unsupervised learning or weakly supervised learning (Locatello et al., 2020). ... We vary the parameter β in {1, 2, 4}, and use linear deterministic warm-up (Bowman et al., 2015; Sønderby et al., 2016) over the first {0, 10000, 50000} training steps. The latent space dimensionality is in {10, 25, 50}. Half of the models are trained with additive noise in the input image. ... Each of the 108 resulting configurations is trained with 10 random seeds. ... We use a batch size of 64 and train for 400k steps. The learning rate is initialized to 1e-4 and halved at 150k and 300k training steps. We clip the global gradient norm to 1.0 before each weight update.