reproducibilityindex.ai

Object-Centric Slot Diffusion

Authors: Jindong Jiang, Fei Deng, Gautam Singh, Sungjin Ahn

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on various object-centric tasks, including the first application of the FFHQ dataset in this field, we demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders, particularly in more complex scenes, and exhibits superior unsupervised compositional generation quality.
Researcher Affiliation	Academia	Jindong Jiang Rutgers University jindong.jiang@rutgers.edu Fei Deng Rutgers University fei.deng@rutgers.edu Gautam Singh Rutgers University singh.gautam@rutgers.edu Sungjin Ahn KAIST sungjin.ahn@kaist.ac.kr
Pseudocode	No	The paper describes procedures and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Project page is available at https://latentslotdiffusion.github.io
Open Datasets	Yes	We evaluate our model on five datasets. Four of them are synthetic multi-object datasets CLEVR [37], CLEVRTex [40], MOVi-C, MOVi-E [24]. Furthermore, we explore the applicability of object-centric models to FFHQ [41], a dataset of high-quality face images.
Dataset Splits	Yes	For CLEVR, we utilize the official split for training and validation sets. [...] For CLEVRTex, there is no official split provided, so we allocate 80% of the data for training, 10% for validation, and 10% for testing. Regarding MOVi-C and MOVi-E, we use 90% of the training set data for training and reserve 10% for validation. [...] For the FFHQ dataset, we use 86% of the dataset (~60K images) for training and 7% (~5K images) for validation.
Hardware Specification	Yes	We train LSD on 2 NVIDIA RTX 6000 GPUs for 4.5 days, while SLATE and SLATE+ are trained in 1 day and 2.7 days using the same GPU setup.
Software Dependencies	No	The paper mentions software like PyTorch and Stable Diffusion, and specific model versions (e.g., 'KL-8 version' for auto-encoder), but does not provide specific version numbers for the general software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup	Yes	We will provide an overview of the implementation details in this section. The hyperparameters used in our approach are listed in Table 6.