reproducibilityindex.ai

Structure by Architecture: Structured Representations without Regularization

Authors: Felix Leeb, Giulia Lanzillotta, Yashas Annadani, Michel Besserve, Stefan Bauer, Bernhard Schölkopf

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate how these models learn a representation that improves results in a variety of downstream tasks including generation, disentanglement, and extrapolation using several challenging and natural image datasets. We train the proposed methods and baselines on two smaller disentanglement image datasets (where D = 64 64 3 and the true factors are independent): 3D-Shapes (Burgess & Kim, 2018) and the three variants ( toy , sim , and real ) of the MPI3D Disentanglement dataset (Gondal et al., 2019), as well as two larger more realistic datasets (where D = 128 128 3): Celeb-A (Liu et al., 2015) and the Robot Finger Dataset (RFD) (Dittadi et al., 2020).
Researcher Affiliation	Academia	Felix Leeb , Giulia Lanzillotta, Yashas Annadani, Michel Besserve, Stefan Bauer, & Bernhard Sch olkopf Max Planck Institute for Intelligent Systems, T ubingen, Germany
Pseudocode	No	The paper describes the proposed architecture and methods in detail using prose and diagrams (e.g., Figure 1), but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	Yes	The exact sizes and connectivity of the models can be seen in the configuration files of a the attached code
Open Datasets	Yes	We train the proposed methods and baselines on two smaller disentanglement image datasets (where D = 64 64 3 and the true factors are independent): 3D-Shapes (Burgess & Kim, 2018) and the three variants ( toy , sim , and real ) of the MPI3D Disentanglement dataset (Gondal et al., 2019), as well as two larger more realistic datasets (where D = 128 128 3): Celeb-A (Liu et al., 2015) and the Robot Finger Dataset (RFD) (Dittadi et al., 2020).
Dataset Splits	Yes	The models are trained with a standard 70-10-20 (train-val-test) split of the datasets where the training objective uses the cross entropy loss (as well as the method specific regularization terms for the baselines).
Hardware Specification	Yes	The models are implemented using Pytorch (Paszke et al., 2019) and were trained on the in-house computing cluster using Nvidia V100 32GB GPUs, so that training a single model takes about 3-4 hours on the smaller datasets and 7-10 hours for Celeb A.
Software Dependencies	No	The paper mentions software like "Pytorch (Paszke et al., 2019)" and "MISH nonlinearity (Misra, 2019)", but it does not specify explicit version numbers for these software components, which is required for reproducible software dependencies.
Experiment Setup	Yes	All models used the same training hyperparameters, which included using an Adam optimizer with a learning rate of 0.0005 and momentum parameters of β1 = 0.9 and β2 = 0.999. For the smaller datasets (3D-Shapes, MPI3D) the models were trained for 100k iterations and a batch size of 128, while for Celeb-A and RFD the models were trained for 200k iterations and a batch size of 32. For the β-VAEs and β-TCVAEs, β {2, 4, 6, 8, 16}, while γ {10, 20, 40, 80} for the FVAE were tested on 3D-Shapes and MPI3D, and the model with the smallest loss on the validation set was used for subsequent analysis, which was β = 2 for the β-VAE and β = 4 for the β-TCVAE and γ = 40 for the FVAE.