SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Authors: Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam Singh, Fei Deng, Jindong Jiang, Sungjin Ahn

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show through experiments on Atari and 3D-Rooms that SPACE achieves the above properties consistently in comparison to SPAIR, IODINE, and GENESIS. Results of our experiments can be found on our project website: https://sites.google.com/view/space-project-page
Researcher Affiliation Academia {Zhixuan Lin1,2 , Yi-Fu Wu1, Skand Vishwanath Peri1,} Weihao Sun1, Gautam Singh1, Fei Deng1, Jindong Jiang1, Sungjin Ahn1 1Rutgers University & 2Zhejiang University
Pseudocode Yes Algorithm 1 and Algorithm 3 present SPACE s inference for foreground and background. Algorithm 2 describe the details of rescalei function in Algorithm 1 that transforms local shift zshift i to global shift ˆzshift i .Algorithm 4 show the details of the generation process of the background module.
Open Source Code No The paper mentions a project website: 'Results of our experiments can be found on our project website: https://sites.google.com/view/space-project-page'. However, it does not explicitly state that the source code for the methodology is available on this site or elsewhere.
Open Datasets Yes We evaluate our model on two datasets: 1) an Atari (Bellemare et al., 2013) dataset that consists of random images from a pretrained agent playing the games, and 2) a generated 3D-room dataset... Atari. For each game, we sample 60,000 random images from a pretrained agent (Wu et al., 2016).
Dataset Splits Yes We split the images into 50,000 for the training set, 5,000 for the validation set, and 5,000 for the testing set. ... We use a training set of 63,000 images, a validation set of 7,000 images, and a test set of 7,000 images.
Hardware Specification Yes IODINE is the slowest overall with its computationally expensive iterative inference procedure. Furthermore, both IODINE and GENESIS require storing data for each of the K components, so we were unable to run our experiments on 256 components or greater before running out of memory on our 22GB GPU.
Software Dependencies No The paper mentions using RMSProp, Adam, Gumbel-Softmax distribution, Spatial Transformer, PyTorch Pixel Shuffle, Group Normalization, CELU, Batch Normalization, and ELU, along with citations to their respective papers. However, it does not specify version numbers for these software components or the programming language used.
Experiment Setup Yes For all experiments we use an image size of 128 128 and a batch size of 12 to 16 depending on memory usage. For the foreground module, we use the RMSProp (Tieleman & Hinton. (2012)) optimizer with a learning rate of 1 10 5 except for Figure 5, for which we use a learning rate of 1 10 4 as SPAIR. For the background module, we use the Adam (Kingma & Ba (2014)) optimizer with a learning rate of 1 10 3. We use gradient clipping with a maximum norm of 1.0. For quantitative results, SPACE is trained up to 160000 steps. ... We list out our hyperparameters for 3D large dataset and joint training for 10 static Atari games below. Hyperparameters for other experiments are similar, but are finetuned for each dataset individually.