Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery

Authors: Anand Gopalakrishnan, Aleksandar Stanić, Jürgen Schmidhuber, Michael C. Mozer

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate models on three datasets from the multi-object suite [32] namely Tetrominoes, d Sprites and CLEVR used by prior work in object-centric learning [33 35]. For CLEVR, we use a filtered version [35] which consists of images containing less than seven objects. In all experiments we use image resolutions identical to Emami et al. [35], i.e., 35x35 for Tetrominoes, 64x64 for d Sprites and 96x96 for CLEVR (center crop of 192x192 resized to 96x96). In Tetrominoes and d Sprites the number of training images is 60K whereas in CLEVR it is 50K. All three datasets have 320 test images on which we report the evaluation metrics. Table 1 compares the performance of recent state-of-the-art synchrony-based baselines (CAE, CAE++, Ct CAE, and RF) against ours (Syn Cx) on the unsupervised object discovery task.
Researcher Affiliation Collaboration Anand Gopalakrishnan1 Aleksandar Stani c2 Jürgen Schmidhuber1,3 Michael Curtis Mozer2 1The Swiss AI Lab, IDSIA, USI & SUPSI, Lugano, Switzerland 2Google Deep Mind 3AI Initiative, KAUST, Thuwal, Saudi Arabia
Pseudocode No The paper describes the model's architecture and training process in detail and provides network specifications in tables, but it does not include formal pseudocode or algorithm blocks.
Open Source Code Yes Official code repository: https://github.com/agopal42/syncx
Open Datasets Yes We evaluate models on three datasets from the multi-object suite [32] namely Tetrominoes, d Sprites and CLEVR used by prior work in object-centric learning [33 35].
Dataset Splits No In Tetrominoes and d Sprites the number of training images is 60K whereas in CLEVR it is 50K. All three datasets have 320 test images on which we report the evaluation metrics. There is no explicit mention of a validation set or specific percentage splits.
Hardware Specification Yes To train our model for 40k steps on 35x35 resolution images from Tetrominoes took 1.7 hours on a NVIDIA Tesla P100 GPU. To train our model for 100k steps on 64x64 resolution images from d Sprites took 4.25 hours on a NVIDIA Tesla P100 GPU. To train our model for 100k steps on 96x96 resolution images from CLEVR took 17.87 hours on a NVIDIA Tesla V100-SXM2 GPU.
Software Dependencies No The paper mentions using 'scikit-learn' and 'umap-learn' libraries for certain functionalities but does not specify their version numbers (e.g., 'We use the t-SNE implementation in the scikit-learn library [71] (sklearn.manifold.TSNE)').
Experiment Setup Yes Table 10: Training hyperparameters for Syn Cx. Hyperparameter Tetrominoes d Sprites CLEVR Training Steps 40,000 100,000 100,000 Batch size 64 16 32 Learning rate 5e-4 5e-4 5e-4 Gradient Norm Clipping 1.0 1.0 1.0 Number of iterations 3 3 4 Phase initialization von-Mises von-Mises von-Mises