Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Evaluating Disentanglement of Structured Representations

Authors: Raphaël Dang-Nhu

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we demonstrate that viewing object compositionality as a disentanglement problem addresses several issues with prior visual metrics of object separation. As a core technical component, we present the first representation probing algorithm handling slot permutation invariance.
Researcher Affiliation Academia Raphaël Dang-Nhu. This work constitutes the public version of Raphaël Dang-Nhu s Master Thesis at ETH Zürich.
Pseudocode Yes Algorithm 1 Permutation-invariant representation probing
Open Source Code No The paper mentions using public PyTorch implementations of existing architectures (MONet, GENESIS) and a third-party implementation for IODINE, but it does not state that its own novel metric or probing algorithm code is open-source or provide a link to it.
Open Datasets Yes We evaluate all models on CLEVR6 (Johnson et al., 2017) and Multi-d Sprites (Matthey et al., 2017; Burgess et al., 2019), with the exception of IODINE that we restricted to Multi-d Sprites for computational reasons, as CLEVR6 requires a week on 8 V100 GPUs per training.
Dataset Splits Yes Each group has 5000 samples, with a 4000/500/500 split for fitting, validation and evaluation of the factor predictor.
Hardware Specification Yes CLEVR6 requires a week on 8 V100 GPUs per training. Models were trained with one to four V100 GPUs. This work was granted access to the HPC resources of IDRIS under the allocation 2020-AD011012138 made by GENCI.
Software Dependencies No The paper mentions using 'public Pytorch implementations' and references third-party implementations for models, but it does not specify version numbers for PyTorch or any other software dependencies (libraries, frameworks, or specific tools) needed to replicate the experiments.
Experiment Setup Yes We train MONet exactly as in Burgess et al. (2019), except that we set σfg = 0.1 and σbg = 0.06 which was shown to yield better results in (Greff et al., 2019). [...] For the temporary predictors in Algorithm 1 (inside the loop), we use a linear model with Ridge regularization. For the final predictor, we use a random forest with 10 trees, and a maximum depth of 15. [...] We train all models for 200 epochs.