PointGPT: Auto-regressively Generative Pre-training from Point Clouds

Authors: Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In particular, our approach achieves classification accuracies of 94.9% on the Model Net40 dataset and 93.4% on the Scan Object NN dataset, outperforming all other transformer models. Furthermore, our method also attains new state-of-the-art accuracies on all four few-shot learning benchmarks. Codes are available at https://github.com/CGuangyan-BIT/Point GPT.
Researcher Affiliation Academia Guangyan Chen1 Meiling Wang1 Yi Yang1 Kai Yu1 Li Yuan2 Yufeng Yue1 1 Beijing Institute of Technology 2 Peking University
Pseudocode No The paper describes the model architecture and processes using natural language and diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/CGuangyan-BIT/Point GPT.
Open Datasets Yes Data: Point GPT-S is pre-trained on the Shape Net [6] dataset without subsequent post-pre-training. This is in line with the previous SSL methods [39; 66; 68; 26] to allow for a direct comparison with these prior approaches. Shape Net contains over 50,000 unique 3D models across 55 object categories. Additionally, two datasets are collected to support the training of high-capacity Point GPT models (Point GPT-B and Point GPT-L): (I) an unlabeled hybrid dataset (UHD) for self-supervised pre-training, which collects point clouds from various datasets [52; 35; 6; 53; 3; 60; 17], such as Shape Net [6], S3DIS [3] for indoor scenes, and Semantic3D [17] for outdoor scenes, etc. In total, the UHD contains approximately 300K point clouds; (II) a labeled hybrid dataset (LHD) for supervised post-pre-training, which aligns the label semantics of different datasets [52; 35; 6; 53; 3; 60], with 87 categories and approximately 200K point clouds in total.
Dataset Splits No The paper mentions using well-known datasets like ModelNet40 and Scan Object NN, which typically have standard splits. For few-shot learning, it describes w-way, s-shot settings. However, it does not explicitly provide the specific train/validation/test split percentages or sample counts for all datasets used, or explicitly refer to a cited standard split for all datasets in the context of their specific experiments.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU models, CPU models, or cloud computing specifications.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Vi T-S configuration', but it does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Pre-training setups: The input point clouds are obtained by sampling 1024 points from each raw point cloud. Afterward, each point cloud is partitioned into 64 point patches, with each patch consisting of 32 points. The Point GPT model is pre-trained for 300 epochs using an Adam W optimizer [31] with a batch size of 128, an initial learning rate of 0.001, and a weight decay of 0.05. Additionally, based on our empirical results, cosine learning rate decay [30] is employed.