OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

Authors: Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Open Shape on zero-shot 3D classification benchmarks and demonstrate its superior capabilities for open-world recognition. Specifically, Open Shape achieves a zero-shot accuracy of 46.8% on the 1,156-category Objaverse-LVIS benchmark, compared to less than 10% for existing methods. Open Shape also achieves an accuracy of 85.3% on Model Net40, outperforming previous zero-shot baseline methods by 20% and performing on par with some fully-supervised methods.
Researcher Affiliation Collaboration Minghua Liu1 Ruoxi Shi2 Kaiming Kuang1 Yinhao Zhu3 Xuanlin Li1 Shizhong Han3 Hong Cai3 Fatih Porikli3 1 UC San Diego 2 Shanghai Jiao Tong University 3 Qualcomm AI Research
Pseudocode No The paper does not include any section or figure explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Project Website: https://colin97.github.io/Open Shape/
Open Datasets Yes we ensemble four currently-largest public 3D datasets for training as shown in Figure 2 (a), resulting in 876k training shapes. Among these four datasets, Shape Net Core [8], 3D-FUTURE [16] and ABO [11] are three popular datasets used by prior works. They contain human-verified high-quality 3D shapes, but only cover a limited number of shapes and dozens of categories. The Objaverse [12] dataset is a more recent dataset
Dataset Splits No The paper describes the datasets used for training (ensembling Shape Net Core, 3D-FUTURE, ABO, and Objaverse) and the test splits of evaluation benchmarks (Model Net40 and Scan Object NN), but it does not specify explicit train/validation splits or cross-validation setup for its own model training and hyperparameter tuning.
Hardware Specification Yes We train the model on a single A100 GPU with a batch size of 200.
Software Dependencies No The paper mentions using 'Open CLIP Vi T-big G-14', 'BLIP', 'Azure cognition services', and 'GPT-4' but does not specify their version numbers or other software dependencies with version numbers.
Experiment Setup Yes We train the model on a single A100 GPU with a batch size of 200. ... For 32.3M version of Point BERT, we utilize a learning rate of 5e 4; for 72.1M version of Point BERT, we utilize a learning rate of 4e 4; and for other models, we utilize a learning rate of 1e 3. For hard-negative mining, the number of seed shapes s is set to 40, and the number of neighbors m is set to 5 per shape, and the threshold δ is set to 0.1.