On the Transfer of Disentangled Representations in Realistic Settings
Authors: Andrea Dittadi, Frederik Träuble, Francesco Locatello, Manuel Wuthrich, Vaibhav Agrawal, Ole Winther, Stefan Bauer, Bernhard Schölkopf
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. |
| Researcher Affiliation | Academia | 1Technical University of Denmark, 2Max Planck Institute for Intelligent Systems, 3ETH Zurich, Department for Computer Science, 4Copenhagen University Hospital, 5University of Copenhagen , 6CIFAR Azrieli Global Scholar |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that the *datasets* are made publicly available with a URL, but it does not provide an explicit statement or link for the *source code* of the methodology described in the paper. |
| Open Datasets | Yes | We propose a dataset consisting of simulated observations from a scene where a robotic arm interacts with a cube in a stage (see Fig. 1). ... Additionally, we recorded an annotated dataset under the same conditions in the real-world setup: we acquired 1,809 camera images from the same viewpoint and recorded the labels of the 7 underlying factors of variation. ... These datasets are made publicly available.1 http://people.tuebingen.mpg.de/ei-datasets/iclr_transfer_paper/robot_ finger_datasets.tar (6.18 GB) |
| Dataset Splits | No | The paper specifies training and testing set sizes for downstream tasks (10k and 5k images respectively) but does not explicitly describe a separate dataset split for validation of the main models or for general model development. It mentions 'validation' in the context of evaluation metrics but not as a dataset partition. |
| Hardware Specification | Yes | Training these models requires approximately 2.8 GPU years on NVIDIA Tesla V100 PCIe. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' (Kingma & Ba, 2014) but does not provide specific version numbers for any software, libraries, or programming languages used in the implementation or experiments (e.g., Python version, PyTorch/TensorFlow version). |
| Experiment Setup | Yes | The hyperparameter sweep is defined as follows: We train the models using either unsupervised learning or weakly supervised learning (Locatello et al., 2020). ... We vary the parameter β in {1, 2, 4}, and use linear deterministic warm-up (Bowman et al., 2015; Sønderby et al., 2016) over the first {0, 10000, 50000} training steps. The latent space dimensionality is in {10, 25, 50}. Half of the models are trained with additive noise in the input image. ... Each of the 108 resulting configurations is trained with 10 random seeds. ... We use a batch size of 64 and train for 400k steps. The learning rate is initialized to 1e-4 and halved at 150k and 300k training steps. We clip the global gradient norm to 1.0 before each weight update. |