Structure by Architecture: Structured Representations without Regularization

Authors: Felix Leeb, Giulia Lanzillotta, Yashas Annadani, Michel Besserve, Stefan Bauer, Bernhard Schölkopf

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate how these models learn a representation that improves results in a variety of downstream tasks including generation, disentanglement, and extrapolation using several challenging and natural image datasets. We train the proposed methods and baselines on two smaller disentanglement image datasets (where D = 64 64 3 and the true factors are independent): 3D-Shapes (Burgess & Kim, 2018) and the three variants ( toy , sim , and real ) of the MPI3D Disentanglement dataset (Gondal et al., 2019), as well as two larger more realistic datasets (where D = 128 128 3): Celeb-A (Liu et al., 2015) and the Robot Finger Dataset (RFD) (Dittadi et al., 2020).
Researcher Affiliation Academia Felix Leeb , Giulia Lanzillotta, Yashas Annadani, Michel Besserve, Stefan Bauer, & Bernhard Sch olkopf Max Planck Institute for Intelligent Systems, T ubingen, Germany
Pseudocode No The paper describes the proposed architecture and methods in detail using prose and diagrams (e.g., Figure 1), but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes The exact sizes and connectivity of the models can be seen in the configuration files of a the attached code
Open Datasets Yes We train the proposed methods and baselines on two smaller disentanglement image datasets (where D = 64 64 3 and the true factors are independent): 3D-Shapes (Burgess & Kim, 2018) and the three variants ( toy , sim , and real ) of the MPI3D Disentanglement dataset (Gondal et al., 2019), as well as two larger more realistic datasets (where D = 128 128 3): Celeb-A (Liu et al., 2015) and the Robot Finger Dataset (RFD) (Dittadi et al., 2020).
Dataset Splits Yes The models are trained with a standard 70-10-20 (train-val-test) split of the datasets where the training objective uses the cross entropy loss (as well as the method specific regularization terms for the baselines).
Hardware Specification Yes The models are implemented using Pytorch (Paszke et al., 2019) and were trained on the in-house computing cluster using Nvidia V100 32GB GPUs, so that training a single model takes about 3-4 hours on the smaller datasets and 7-10 hours for Celeb A.
Software Dependencies No The paper mentions software like "Pytorch (Paszke et al., 2019)" and "MISH nonlinearity (Misra, 2019)", but it does not specify explicit version numbers for these software components, which is required for reproducible software dependencies.
Experiment Setup Yes All models used the same training hyperparameters, which included using an Adam optimizer with a learning rate of 0.0005 and momentum parameters of β1 = 0.9 and β2 = 0.999. For the smaller datasets (3D-Shapes, MPI3D) the models were trained for 100k iterations and a batch size of 128, while for Celeb-A and RFD the models were trained for 200k iterations and a batch size of 32. For the β-VAEs and β-TCVAEs, β {2, 4, 6, 8, 16}, while γ {10, 20, 40, 80} for the FVAE were tested on 3D-Shapes and MPI3D, and the model with the smallest loss on the validation set was used for subsequent analysis, which was β = 2 for the β-VAE and β = 4 for the β-TCVAE and γ = 40 for the FVAE.