Multi-Modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

Authors: Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Scan Net and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by 4% to 6% m Io U. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.
Researcher Affiliation Academia 1School of Software, Beihang University 2College of Computer Science and Technology, Zhejiang University 3Department of Computer Science, The University of Hong Kong {ZY2121108,ZY2121121,zhang jing,qianyu,lsheng}@buaa.edu.cn, tianyizhang0213@zju.edu.cn, dongxu@hku.hk
Pseudocode No The paper describes the proposed method in the 'Methodology' section using prose and a pipeline diagram (Figure 1), but it does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code Yes Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.
Open Datasets Yes We evaluate the proposed approach MMA on two benchmarks, Scan Net (Dai et al. 2017) and S3DIS (Armeni et al. 2017) datasets. Scan Net is a commonly-used indoor 3D point cloud dataset for semantic segmentation. It contains 1513 training scenes (1201 scenes for training, 312 scenes for validation) and 100 test scenes, annotated with 20 classes. S3DIS is also an indoor 3D point cloud dataset, which contains 6 indoor areas and has 13 classes.
Dataset Splits Yes Scan Net is a commonly-used indoor 3D point cloud dataset for semantic segmentation. It contains 1513 training scenes (1201 scenes for training, 312 scenes for validation) and 100 test scenes, annotated with 20 classes. By following the previous work, we use area 5 as the test data.
Hardware Specification Yes The model is trained on 3090 GPU with batch size 8 for 300 epochs.
Software Dependencies No The paper mentions using "Adam W optimizer" and "Point Net++" as backbone, but it does not specify version numbers for these or other software libraries/frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The RGB-appended input point clouds are a set of 10-dimensional vectors, including coordinates (x,y,z), color (R,G,B), surface normal, and height, while the pure geometric input is produced by masking out the RGB values with 0. The model is trained on 3090 GPU with batch size 8 for 300 epochs. We use Adam W optimizer with an initial learning rate of 0.0014 and decay to half at 160 epochs and 180 epochs. All hyper-parameters are tuned based on the validation set.