reproducibilityindex.ai

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

Authors: Ayca Takmaz, Elisabetta Fedele, Robert Sumner, Marc Pollefeys, Federico Tombari, Francis Engelmann

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments and ablation studies on Scan Net200 and Replica show that Open Mask3D outperforms other open-vocabulary methods, especially on the long-tail distribution. Qualitative experiments further showcase Open Mask3D s ability to segment object properties based on free-form queries describing geometry, affordances, and materials.
Researcher Affiliation	Collaboration	Ayça Takmaz1 Elisabetta Fedele1 Robert W. Sumner1 Marc Pollefeys1,2 Federico Tombari3 Francis Engelmann1,3 1ETH Zürich 2Microsoft 3Google
Pseudocode	Yes	Algorithm 1 2D mask selection algorithm
Open Source Code	Yes	openmask3d.github.io
Open Datasets	Yes	We conduct our experiments using the Scan Net200 [57] and Replica [61] datasets.
Dataset Splits	Yes	We report our Scan Net200 results on the validation set consisting of 312 scenes, and evaluate for the 3D instance segmentation task using the closed vocabulary of 200 categories from the Scan Net200 annotations.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided. The paper only mentions "a single GPU" for computation time.
Software Dependencies	No	The paper mentions using specific models and tools like CLIP [55], SAM [36], and Mask3D [58] but does not provide specific version numbers for these software components or any other ancillary software.
Experiment Setup	Yes	Open Mask3D implementation details. We use posed RGB-depth pairs for both the Scan Net200 and Replica datasets, and we process 1 frame in every 10 frames in the RGB-D sequences. In order to compute image features on the mask-crops, we use CLIP [55] visual encoder from the Vi T-L/14 model pre-trained at a 336 pixel resolution, which has a feature dimensionality of 768. For the visibility score computation, we use kthreshold = 0.2, and for top-view selection we use kview = 5. In all experiments with multi-scale crops, we use L = 3 levels. In the 2D mask selection algorithm based on SAM [36], we repeat the process for krounds = 10 rounds, and sample ksample = 5 points at each iteration.