Compositional Visual Generation with Energy Based Models

Authors: Yilun Du, Shuang Li, Igor Mordatch

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform empirical studies to answer the following questions: (1) Can EBMs exhibit concept compositionality (such as concept negation, conjunction, and disjunction) in generating images? (2) Can we take advantage of concept combinations to learn new concepts in a continual manner? (3) Does explicit factor decomposition enable generalization to novel combinations of factors? (4) Can we perform concept inference across multiple inputs? We perform experiments on 64x64 object scenes rendered in Mu Jo Co [27] (Mu Jo Co Scenes) and the 128x128 Celeb A dataset.
Researcher Affiliation Collaboration Yilun Du MIT CSAIL yilundu@mit.edu Shuang Li MIT CSAIL lishuang@mit.edu Igor Mordatch Google Brain imordatch@google.com
Pseudocode No The paper describes methods using mathematical equations (e.g., Equation 3, 5, 8, 10) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or structured pseudocode.
Open Source Code Yes Code and data available at https://energy-based-model.github.io/ compositional-generation-inference/
Open Datasets Yes We perform experiments on 64x64 object scenes rendered in Mu Jo Co [27] (Mu Jo Co Scenes) and the 128x128 Celeb A dataset. Code and data available at https://energy-based-model.github.io/ compositional-generation-inference/
Dataset Splits No No specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit standard split citations with details) are provided for the main EBM experiments. While a "test set" is mentioned for a supervised classifier ("Our classifier obtains 99.3% accuracy for position and 99.9% for color on the test set."), the full split details for reproducibility are absent.
Hardware Specification No The paper mentions "1 GPU" and "8 GPUs" for training ("Models are trained on Mu Jo Co datasets for up to 1 day on 1 GPU and for 1 day on 8 GPUs for Celeb A"), but does not specify any particular GPU model (e.g., NVIDIA, A100, V100) or other hardware details like CPU, memory, or cloud instance types.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). It mentions "Image Net32x32 architecture and Image Net128x128 architecture from [3] with the Swish activation [22]", which describes model components, not the software stack.
Experiment Setup No The paper states, "More training details and model architecture can be found in the appendix," indicating that specific experimental setup details such as hyperparameters are not provided within the main text.