Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion
Authors: Yash Bhalgat, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our method to recent techniques including Panoptic Lifting [48] on standard 3D instance segmentation benchmarks, viz. Scan Net [13], Replica [50], and Hypersim [46]. To better demonstrate the scalability of our method to a very large number of object instances, we introduce a semi-realistic Messy Rooms dataset featuring scenes with up to 500 objects. Our approach outperforms the state-of-the-art on challenging scenes from the Scan Net, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method. |
| Researcher Affiliation | Academia | Yash Bhalgat Iro Laina João F. Henriques Andrew Zisserman Andrea Vedaldi Visual Geometry Group University of Oxford {yashsb,iro,joao,az,vedaldi}@robots.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that the Messy Rooms dataset can be accessed via a link, but does not explicitly state that the source code for the Contrastive Lift method is open-source or provide a link for it. |
| Open Datasets | Yes | We train and evaluate our proposed method on challenging scenes from the Scan Net [13], Hypersim [46], and Replica [50] datasets. To better demonstrate the scalability of our method to a very large number of object instances, we introduce a semi-realistic Messy Rooms dataset featuring scenes with up to 500 objects. The full Messy Rooms dataset introduced in this work can be accessed at this link: https://figshare.com/s/b195ce8bd8eafe79762b. |
| Dataset Splits | Yes | We follow Panop Li [48] for the data preprocessing steps and train-test splits for each scene from these datasets. To determine an optimal value, we perform a hyperparameter sweep using 10% of the training data, which includes training viewpoints and associated segments from the 2D segmenter. |
| Hardware Specification | Yes | Table 6 compares the training speed, measured on a NVIDIA A40 GPU, between Panop Li and our method, showing that Panop Li iterations become slower as K increases. |
| Software Dependencies | No | The paper mentions software components like Mask2Former, Detic, TensoRF, and HDBSCAN, but it does not specify their version numbers, which is required for reproducibility. |
| Experiment Setup | Yes | We train our neural field model for 400k iterations on all scenes. The RGB reconstruction loss, semantic segmentation loss, instance embedding loss, and segment consistency loss are balanced using weights of 1.0, 0.1, 0.1, and 1.0 respectively. A learning rate of 5e-4 is used for all MLPs and 0.01 for the grids. A batch-size of 2048 is used to train all models. |