Multi-Object Representation Learning via Feature Connectivity and Object-Centric Regularization

Authors: Alex Foo, Wynne Hsu, Mong Li Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on simulated, real-world, complex texture and common object images demonstrate a substantial improvement in the quality of discovered objects compared to state-of-the-art methods, as well as the sample efficiency and generalizability of our approach. We also show that the discovered object-centric representations can accurately predict key object properties in downstream tasks, highlighting the potential of our method to advance the field of multi-object representation learning.
Researcher Affiliation Academia Alex Foo Wynne Hsu Mong Li Lee School of Computing National University of Singapore {alexfoo,whsu,leeml}@comp.nus.edu.sg
Pseudocode No The paper describes the methodology and algorithms in prose and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about open-sourcing its code or a direct link to a code repository for the methodology described.
Open Datasets Yes Table 1: Summary of dataset characteristics (Multi-d Sprites, Tetrominoes-NM, SVHN, IDRi D, CLEVRTEX, CLEVRTEX-OOD, Flowers, Birds, COCO). Multi-d Sprites [23] and Tetrominoes-NM. The former consists of multiple oval, heart or square-shaped sprites with some occlusions, while the latter is a subset of the original Tetrominoes dataset [23] where images whose ground truth segmentation requires knowledge of the object shapes are filtered out.
Dataset Splits Yes Following [35, 11], we use the first 60K samples in Multid Sprites, Tetrominoes-NM and SVHN for training and hold out the next 320 samples for testing. For IDRi D, we split this dataset into 54 images for training and 27 images for testing. For CLEVRTEX, we use the first 40K samples for training and last 5K samples for testing. For CLEVRTEX-OOD, we use 10K samples for testing. For Flowers, we use the first 6K samples for training and last 1K samples for testing. For Birds, we use the first 10K samples for training and last 1K samples for testing. For COCO, we use the first 10K samples for training and last 2K samples for testing.
Hardware Specification Yes Training on 64-by-64 images from Multi-d Sprites on a single V100 GPU with 32GB of RAM takes about 10 minutes.
Software Dependencies No The paper mentions using "Adam [28]" as an optimizer and cites "Pytorch [39]", but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We train OC-Net for 1000 iterations with a batch size of 64 using Adam [28] with a learning rate of 1 10 3. We carried out an initial experiment to choose the clustering threshold. The results show that the value can range from 0.2 to 2.0 without affecting the performance of OC-Net. As such, we set the threshold to = 0.7 so that two pixels will belong to the same object if their normalized feature similarity is more than 50%. If a pixel is assigned to multiple objects, we assign it to the mask of the first object in that list and ignore its membership in other objects. For all methods, we set the maximum number of foreground objects to 6 and 4 for Multi-d Sprites and Tetrominoes respectively. Training is carried out for 300,000 iterations with a batch size of 64, using the Adam optimizer with a base learning rate of 4 10 4. We set the size of the latent space to be D = 64 for all models. For SVHN and COCO, the number of objects is set to 6. For IDRi D, the number of objects is set to 20 and we train them for 100,000 iterations. For CLEVRTEX and CLEVRTEX-OOD, the number of objects is set to 11. For Flowers and Birds, the number of objects is set to 2.