Unsupervised Learning of Compositional Energy Concepts

Authors: Yilun Du, Shuang Li, Yash Sharma, Josh Tenenbaum, Igor Mordatch

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We quantitatively and qualitatively show that COMET can recover the global factors of variation in an image in Section 5.1, as well as the local factors in an image in Section 5.2. Furthermore, we show that the components captured by COMET can generalize well, across separate modalities in Section 5.3. and Finally, we evaluate the learned representations on disentanglement. In Falcor3D [40], each image corresponds to a combination of 7 factors of variation; lighting intensity, lighting x, y & z direction, and camera x, y & z position. We consider three commonly used metrics for evaluation, the Beta VAE metric [23], the Mutual Information Gap (MIG) [7], and the Mean Correlation Coefficient (MCC) [25].
Researcher Affiliation Collaboration Yilun Du MIT CSAIL yilundu@mit.edu Shuang Li MIT CSAIL lishuang@mit.edu Yash Sharma University of Tübingen yash.sharma@uni-tuebingen.de Joshua B. Tenenbaum MIT CSAIL, BCS, CBMM jbt@mit.edu Igor Mordatch Google Brain imordatch@google.com
Pseudocode Yes We provide pseudocode for training our model in Algorithm 1.
Open Source Code Yes *Code and data available at https://energy-based-model.github.io/comet/
Open Datasets Yes We assess the ability of COMET to decompose global factors of variation in scenes consisting of lighting and camera illumination from Falcor3D (NVIDIA high-resolution disentanglement dataset) [40], scene factors of variation in CLEVR [29], and face attributes in real images from Celeb A-HQ [30].
Dataset Splits No No explicit training/test/validation dataset splits (e.g., percentages, counts, or specific split methods) are provided in the paper.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments are mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific Python versions) are provided in the paper.
Experiment Setup No The paper mentions general experimental settings like 'latent dimension of 64' or 'small latent dimension (16)' and that 'additional training algorithm and model architecture details' are in the appendix, but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations in the main text.