3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset
Authors: Junjie Zhang, Tianci Hu, Xiaoshui Huang, Yongshun Gong, Dan Zeng
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions. |
| Researcher Affiliation | Collaboration | Junjie Zhang1 , Tianci Hu1 , Xiaoshui Huang2 , Yongshun Gong3 and Dan Zeng1 1Shanghai University 2Shanghai AI Laboratory 3Shandong University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/Inshsang/3DBench. |
| Open Datasets | Yes | Point LLM [Xu et al., 2023] utilizes Cap3D [Luo et al., 2023], a 3D object captioning dataset derived from Objaverse [Deitke et al., 2023]. Point Bind & Point-LLM [Guo et al., 2023] is trained using Ulip [Xue et al., 2023], constructed based on Shape Net [Chang et al., 2015]. ... During the initial step, we extract comprehensive metadata from the Procthor simulation framework [Deitke et al., 2022]. ... utilizing 224,000 for training and 8,000 for evaluation. |
| Dataset Splits | Yes | We construct a dataset of more than 0.23 million instruction-tuning samples, utilizing 224,000 for training and 8,000 for evaluation. ... The distribution of our instruction-tuning dataset is depicted in Table 1. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments (e.g., GPU models, CPU types, or memory). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | LAMM Settings: To ensure a fair comparison of the point cloud understanding abilities among three models, we conduct tests using the 7B versions for all models. During the retraining experiment with LAMM, we identify biases in various evaluation metrics related to output text lengths. As a result, we adjust the target length for different tasks, aiming to reveal the optimal performance of each model on the respective dataset. Point LLM & Point-LLM Settings: We maintain the default model parameter settings for both models. Following their guidelines, we uniformly sample point clouds to a fixed quantity. |