Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our method to recent techniques including Panoptic Lifting [48] on standard 3D instance segmentation benchmarks, viz. Scan Net [13], Replica [50], and Hypersim [46]. To better demonstrate the scalability of our method to a very large number of object instances, we introduce a semi-realistic Messy Rooms dataset featuring scenes with up to 500 objects. Our approach outperforms the state-of-the-art on challenging scenes from the Scan Net, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method.
Researcher Affiliation Academia Yash Bhalgat Iro Laina João F. Henriques Andrew Zisserman Andrea Vedaldi Visual Geometry Group University of Oxford {yashsb,iro,joao,az,vedaldi}@robots.ox.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions that the Messy Rooms dataset can be accessed via a link, but does not explicitly state that the source code for the Contrastive Lift method is open-source or provide a link for it.
Open Datasets Yes We train and evaluate our proposed method on challenging scenes from the Scan Net [13], Hypersim [46], and Replica [50] datasets. To better demonstrate the scalability of our method to a very large number of object instances, we introduce a semi-realistic Messy Rooms dataset featuring scenes with up to 500 objects. The full Messy Rooms dataset introduced in this work can be accessed at this link: https://figshare.com/s/b195ce8bd8eafe79762b.
Dataset Splits Yes We follow Panop Li [48] for the data preprocessing steps and train-test splits for each scene from these datasets. To determine an optimal value, we perform a hyperparameter sweep using 10% of the training data, which includes training viewpoints and associated segments from the 2D segmenter.
Hardware Specification Yes Table 6 compares the training speed, measured on a NVIDIA A40 GPU, between Panop Li and our method, showing that Panop Li iterations become slower as K increases.
Software Dependencies No The paper mentions software components like Mask2Former, Detic, TensoRF, and HDBSCAN, but it does not specify their version numbers, which is required for reproducibility.
Experiment Setup Yes We train our neural field model for 400k iterations on all scenes. The RGB reconstruction loss, semantic segmentation loss, instance embedding loss, and segment consistency loss are balanced using weights of 1.0, 0.1, 0.1, and 1.0 respectively. A learning rate of 5e-4 is used for all MLPs and 0.01 for the grids. A batch-size of 2048 is used to train all models.