The role of Disentanglement in Generalisation

Authors: Milton Llera Montero, Casimir JH Ludwig, Rui Ponte Costa, Gaurav Malhotra, Jeffrey Bowers

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, we systematically tested how the degree of disentanglement affects various forms of generalisation, including two forms of combinatorial generalisation that varied in difficulty. We trained three classes of variational autoencoders (VAEs) on two datasets on an unsupervised task by excluding combinations of generative factors during training. At test time we ask the models to reconstruct the missing combinations in order to measure generalisation performance.
Researcher Affiliation Academia Milton L. Montero1,2, Casimir J.H. Ludwig 1, Rui Ponte Costa2, Gaurav Malhotra1 & Jeffrey S. Bowers 1 1. School of Psychological Science 2. Computational Neuroscience Unit, Department of Computer Science University of Bristol Bristol, United Kingdom {m.lleramontero,c.ludwig,rui.costa,gaurav.malhotra,j.bowers}@bristol.ac.uk
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Working code for running these experiments and analyses can be downloaded at https://github. com/mmrl/disent-and-gen.
Open Datasets Yes We assessed combinatorial generalisation on two different datasets. The d Sprites image dataset (Matthey et al., 2017) contains 2D images... The 3D Shapes dataset (Burgess & Kim, 2018) contains 3D images...
Dataset Splits No The paper describes excluding combinations from the training data for testing, but it does not specify a distinct validation dataset split with percentages, counts, or explicit cross-validation methodology.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies No The paper mentions software like PyTorch (Paszke et al., 2019), Ignite and Sacred frameworks (V. Fomin & Tejani, 2020; Klaus Greff et al., 2017), and the Sklearn library (Pedregosa et al., 2011), but it does not specify explicit version numbers for these software dependencies.
Experiment Setup Yes We used a batch size of 64 and a learning rate of 5e 4 for the Adam optimizer... Training on the unsupervised tasks ran for 100 epochs for d Sprites and 65 epochs for Shapes3D... The learning rate was fixed at 1e 4 and the batch size at 64. β values used were 1, 4, 8, 12, 16 on the full d Sprite dataset. ... For the Factor VAE we used γ = 20, 50, 100 throughout. In the composition task the models where trained for 100 epochs with β = 1.