Generative Scene Graph Networks

Authors: Fei Deng, Zhuo Zhi, Donghun Lee, Sungjin Ahn

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate GSGN on datasets of scenes containing multiple compositional objects, including a challenging Compositional CLEVR dataset that we have developed. We show that GSGN is able to infer the latent scene graph, generalize out of the training regime, and improve data efficiency in downstream tasks.
Researcher Affiliation Academia Fei Deng Rutgers University fei.deng@rutgers.eduZhuo Zhi University of California, San Diego zzhi@ucsd.eduDonghun Lee ETRI donghun@etri.re.krSungjin Ahn Rutgers University sjn.ahn@gmail.com
Pseudocode Yes We provide the implementation outline for computing q(zfg | x) in Algorithm 1 and Algorithm 2.
Open Source Code No The paper mentions developing and releasing the Compositional CLEVR dataset but does not explicitly state that the source code for their proposed method (GSGN) is open-source or provide a link to it.
Open Datasets Yes For evaluation, we develop two datasets of scenes containing multiple compositional 2D and 3D objects, respectively. These can be regarded as compositional versions of Multi-d Sprites (Greff et al., 2019) and CLEVR (Johnson et al., 2017), two commonly used datasets for evaluating unsupervised object-level scene decomposition. For example, the compositional 3D objects in our dataset are made up of shapes similar to those in the CLEVR dataset, with variable sizes, colors, and materials. Hence, we name our 3D dataset the Compositional CLEVR dataset. (ii) we develop and release the Compositional CLEVR dataset to facilitate future research on object compositionality
Dataset Splits Yes Each dataset consists of 128 128 color images, split into 64000 for training, 12800 for validation, and 12800 for testing.
Hardware Specification No We train GSGN on a single GPU, using Adam optimizer (Kingma & Ba, 2014) with a batch size of 64 and a learning rate of 3 10 4, for up to 500 epochs. ... GSGN-9 is trained on two GPUs, each taking a batch size of 32, using the same schedule. The paper mentions 'GPU' but does not specify the model or any other hardware details.
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'Gumbel-Softmax trick' which are algorithms, but does not list any specific software libraries, frameworks, or their version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes We train GSGN on a single GPU, using Adam optimizer (Kingma & Ba, 2014) with a batch size of 64 and a learning rate of 3 10 4, for up to 500 epochs. We use gradient clipping to ensure that the infinity norm of the gradient does not exceed 1.0. The temperature for Gumbel-Softmax (Jang et al., 2016; Maddison et al., 2016) is exponentially annealed from 2.5 to 0.5 during the first 20 epochs. Similar to Slot Attention (Locatello et al., 2020), the learning rate is linearly increased from 0 to 3 10 4 during the first 10 epochs, and exponentially decayed to half of its value every 100 epochs. We set σfg = 0.3 and σbg = 0.1. On the Compositional CLEVR dataset, σ2 fg has an initial value of 0.152, and is linearly increased from 0.152 to 0.32 during epochs 20-40. Similar to SPACE (Lin et al., 2020b), the mixing weight ˆmr is fixed at the start of training. On the 2D Shapes dataset, we fix ˆmr = 1 10 5 for 1 epoch, while on the Compositional CLEVR dataset, we fix ˆmr = 0.1 for 2 epochs.