Bootstrapping Top-down Information for Self-modulating Slot Attention
Authors: Dongwon Kim, Seoyeon Kim, Suha Kwak
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments |
| Researcher Affiliation | Academia | Dept. of CSE, POSTECH1 Graduate School of AI, POSTECH2 {kdwon, syeonkim07, suha.kwak}@postech.ac.kr |
| Pseudocode | Yes | Algorithm 1 Self-modulating Slot Attention. |
| Open Source Code | No | Code will be made public for open access after publication. |
| Open Datasets | Yes | Datasets To verify the proposed method in diverse settings, including synthetic and authentic datasets, we considered four object-centric learning benchmarks: MOVI-C [15], MOVI-E [15], PASCAL VOC 2012 [14], and MS COCO 2017 [29]. |
| Dataset Splits | Yes | MOVI-C contains 87,633 images for training and 6,000 images for evaluation, while MOVI-E contains 87,741 and 6,000, respectively. ... For the evaluation, we use the validation split containing 1,449 images. The COCO dataset consists of 118,287 training images and 5,000 images for evaluation. |
| Hardware Specification | Yes | Full training of the model takes 26 hours using a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions software components and frameworks like DINO, ViT, Adam optimizer, GRU, MLP, and autoregressive transformer decoder, but does not provide specific version numbers for these software dependencies or the programming language used. |
| Experiment Setup | Yes | The model is trained using an Adam optimizer [26] with an initial learning rate of 0.0004, while the encoder parameters are not trained. The number of slots K is set to 11, 24, 7, and 6 for MOVI-C, MOVI-E, COCO, and VOC, respectively. The codebook size E is set to 128 for synthetic datasets (MOVI-C and MOVI-E) and 512 for authentic datasets (COCO and VOC). The model is trained for 250K iterations on VOC and for 500K iterations on the others. |