PointMamba: A Simple State Space Model for Point Cloud Analysis

Authors: Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Xiang Bai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations demonstrate that Point Mamba achieves superior performance across multiple datasets while significantly reducing GPU memory usage and FLOPs. This work underscores the potential of SSMs in 3D vision-related tasks and presents a simple yet effective Mamba-based baseline for future research. The code is available at https://github.com/LMD0311/Point Mamba.
Researcher Affiliation Collaboration 1Huazhong University of Science & Technology, 2Baidu Inc.
Pseudocode No The paper includes architectural diagrams and mathematical formulations but does not contain explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes The code is available at https://github.com/LMD0311/Point Mamba.
Open Datasets Yes For the pre-training, we utilize Shape Net Core [5] as the dataset, following previous methods [60, 38, 6].
Dataset Splits No The paper uses datasets like Shape Net Core, Model Net40, Scan Object NN, and Shape Net Part for pre-training and various tasks. It mentions "mask modeling visualization on Shape Net validation set" in Appendix D.1, implying the use of a validation set. However, explicit numerical percentages or counts for training, validation, and test splits are not provided in the main text or tables, instead relying on standard usage of these benchmark datasets.
Hardware Specification Yes To fully explore the potential of processing the long point tokens (sequence), we gradually increase the sequence length until the GPU (NVIDIA A800 80GB) memory explodes.
Software Dependencies No The paper does not explicitly list specific software dependencies with their version numbers, such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes Pre-training Details. The Shape Net Core dataset [5] is used for pre-training, including 51K clean 3D sample across 55 categories. The 1,024 input points are divided into 64 point patches, with each patch consisting of 32 points. The pre-training process includes 300 epochs, with a batch size of 128. More detail can be found in Tab. 10. Table 10: Implementation details for pre-training and downstream tasks. Configuration Pre-training Classification Segmentation Optimizer Adam W Adam W Adam W Adam W Learning rate 1e-3 3e-4 5e-4 2e-4 Weight decay 5e-2 5e-2 5e-2 5e-2 Learning rate scheduler cosine cosine cosine cosine Training epochs 300 300 300 300 Warmup epochs 10 10 10 10 Batch size 128 32 32 16 Num. of encoder layers N 12 12 12 12 Num. of decoder layers 4 Input points M 1024 1024 2048 2048 Num. of patches n 64 64 128 128 Patch size k 32 32 32 32 Augmentation Scale&Trans Scale&Trans Rotation -