reproducibilityindex.ai

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

Authors: Junjie Zhang, Tianci Hu, Xiaoshui Huang, Yongshun Gong, Dan Zeng

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Thorough experiments evaluating trending MLLMs, comparisons against existing datasets, and variations of training protocols demonstrate the superiority of 3DBench, offering valuable insights into current limitations and potential research directions.
Researcher Affiliation	Collaboration	Junjie Zhang1 , Tianci Hu1 , Xiaoshui Huang2 , Yongshun Gong3 and Dan Zeng1 1Shanghai University 2Shanghai AI Laboratory 3Shandong University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Codes are available at https://github.com/Inshsang/3DBench.
Open Datasets	Yes	Point LLM [Xu et al., 2023] utilizes Cap3D [Luo et al., 2023], a 3D object captioning dataset derived from Objaverse [Deitke et al., 2023]. Point Bind & Point-LLM [Guo et al., 2023] is trained using Ulip [Xue et al., 2023], constructed based on Shape Net [Chang et al., 2015]. ... During the initial step, we extract comprehensive metadata from the Procthor simulation framework [Deitke et al., 2022]. ... utilizing 224,000 for training and 8,000 for evaluation.
Dataset Splits	Yes	We construct a dataset of more than 0.23 million instruction-tuning samples, utilizing 224,000 for training and 8,000 for evaluation. ... The distribution of our instruction-tuning dataset is depicted in Table 1.
Hardware Specification	No	The paper does not specify the hardware used for experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	LAMM Settings: To ensure a fair comparison of the point cloud understanding abilities among three models, we conduct tests using the 7B versions for all models. During the retraining experiment with LAMM, we identify biases in various evaluation metrics related to output text lengths. As a result, we adjust the target length for different tasks, aiming to reveal the optimal performance of each model on the respective dataset. Point LLM & Point-LLM Settings: We maintain the default model parameter settings for both models. Following their guidelines, we uniformly sample point clouds to a fixed quantity.