Object-Centric Slot Diffusion

Authors: Jindong Jiang, Fei Deng, Gautam Singh, Sungjin Ahn

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on various object-centric tasks, including the first application of the FFHQ dataset in this field, we demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders, particularly in more complex scenes, and exhibits superior unsupervised compositional generation quality.
Researcher Affiliation Academia Jindong Jiang Rutgers University jindong.jiang@rutgers.edu Fei Deng Rutgers University fei.deng@rutgers.edu Gautam Singh Rutgers University singh.gautam@rutgers.edu Sungjin Ahn KAIST sungjin.ahn@kaist.ac.kr
Pseudocode No The paper describes procedures and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes Project page is available at https://latentslotdiffusion.github.io
Open Datasets Yes We evaluate our model on five datasets. Four of them are synthetic multi-object datasets CLEVR [37], CLEVRTex [40], MOVi-C, MOVi-E [24]. Furthermore, we explore the applicability of object-centric models to FFHQ [41], a dataset of high-quality face images.
Dataset Splits Yes For CLEVR, we utilize the official split for training and validation sets. [...] For CLEVRTex, there is no official split provided, so we allocate 80% of the data for training, 10% for validation, and 10% for testing. Regarding MOVi-C and MOVi-E, we use 90% of the training set data for training and reserve 10% for validation. [...] For the FFHQ dataset, we use 86% of the dataset (~60K images) for training and 7% (~5K images) for validation.
Hardware Specification Yes We train LSD on 2 NVIDIA RTX 6000 GPUs for 4.5 days, while SLATE and SLATE+ are trained in 1 day and 2.7 days using the same GPU setup.
Software Dependencies No The paper mentions software like PyTorch and Stable Diffusion, and specific model versions (e.g., 'KL-8 version' for auto-encoder), but does not provide specific version numbers for the general software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup Yes We will provide an overview of the implementation details in this section. The hyperparameters used in our approach are listed in Table 6.