Object-Centric Slot Diffusion
Authors: Jindong Jiang, Fei Deng, Gautam Singh, Sungjin Ahn
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on various object-centric tasks, including the first application of the FFHQ dataset in this field, we demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders, particularly in more complex scenes, and exhibits superior unsupervised compositional generation quality. |
| Researcher Affiliation | Academia | Jindong Jiang Rutgers University jindong.jiang@rutgers.edu Fei Deng Rutgers University fei.deng@rutgers.edu Gautam Singh Rutgers University singh.gautam@rutgers.edu Sungjin Ahn KAIST sungjin.ahn@kaist.ac.kr |
| Pseudocode | No | The paper describes procedures and mathematical formulations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page is available at https://latentslotdiffusion.github.io |
| Open Datasets | Yes | We evaluate our model on five datasets. Four of them are synthetic multi-object datasets CLEVR [37], CLEVRTex [40], MOVi-C, MOVi-E [24]. Furthermore, we explore the applicability of object-centric models to FFHQ [41], a dataset of high-quality face images. |
| Dataset Splits | Yes | For CLEVR, we utilize the official split for training and validation sets. [...] For CLEVRTex, there is no official split provided, so we allocate 80% of the data for training, 10% for validation, and 10% for testing. Regarding MOVi-C and MOVi-E, we use 90% of the training set data for training and reserve 10% for validation. [...] For the FFHQ dataset, we use 86% of the dataset (~60K images) for training and 7% (~5K images) for validation. |
| Hardware Specification | Yes | We train LSD on 2 NVIDIA RTX 6000 GPUs for 4.5 days, while SLATE and SLATE+ are trained in 1 day and 2.7 days using the same GPU setup. |
| Software Dependencies | No | The paper mentions software like PyTorch and Stable Diffusion, and specific model versions (e.g., 'KL-8 version' for auto-encoder), but does not provide specific version numbers for the general software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | We will provide an overview of the implementation details in this section. The hyperparameters used in our approach are listed in Table 6. |