reproducibilityindex.ai

Bootstrapping Top-down Information for Self-modulating Slot Attention

Authors: Dongwon Kim, Seoyeon Kim, Suha Kwak

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments
Researcher Affiliation	Academia	Dept. of CSE, POSTECH1 Graduate School of AI, POSTECH2 {kdwon, syeonkim07, suha.kwak}@postech.ac.kr
Pseudocode	Yes	Algorithm 1 Self-modulating Slot Attention.
Open Source Code	No	Code will be made public for open access after publication.
Open Datasets	Yes	Datasets To verify the proposed method in diverse settings, including synthetic and authentic datasets, we considered four object-centric learning benchmarks: MOVI-C [15], MOVI-E [15], PASCAL VOC 2012 [14], and MS COCO 2017 [29].
Dataset Splits	Yes	MOVI-C contains 87,633 images for training and 6,000 images for evaluation, while MOVI-E contains 87,741 and 6,000, respectively. ... For the evaluation, we use the validation split containing 1,449 images. The COCO dataset consists of 118,287 training images and 5,000 images for evaluation.
Hardware Specification	Yes	Full training of the model takes 26 hours using a single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions software components and frameworks like DINO, ViT, Adam optimizer, GRU, MLP, and autoregressive transformer decoder, but does not provide specific version numbers for these software dependencies or the programming language used.
Experiment Setup	Yes	The model is trained using an Adam optimizer [26] with an initial learning rate of 0.0004, while the encoder parameters are not trained. The number of slots K is set to 11, 24, 7, and 6 for MOVI-C, MOVI-E, COCO, and VOC, respectively. The codebook size E is set to 128 for synthetic datasets (MOVI-C and MOVI-E) and 512 for authentic datasets (COCO and VOC). The model is trained for 250K iterations on VOC and for 500K iterations on the others.