Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding

Authors: Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our novel representation achieves state-of-the-artperformance on 3D classification, shape retrieval, and robust 3D part segmentation on standard benchmarks ( Scan Object NN, Shape Net Core55, and Shape Net Parts). 4 Experiments
Researcher Affiliation Academia King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia {abdullah.hamdi, silvio.giancola, bernard.ghanem}@kaust.edu.sa
Pseudocode No The paper describes operations like Voint Max and Voint Conv in mathematical and textual form but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/ajhamdi/vointcloud
Open Datasets Yes We benchmark Voint Net on the challenging and realistic Scan Object NN dataset for 3D point cloud classification (Uy et al., 2019). For the shape retrieval task, we benchmark on Shape Net Core55 as a subset of Shape Net (Chang et al., 2015). On the other hand, for the task of shape part segmentation, we test on Shape Net Parts (Yi et al., 2016), a subset of Shape Net (Chang et al., 2015). For occlusion robustness, we follow MVTN (Hamdi et al., 2021) and test on Model Net40 (Wu et al., 2015).
Dataset Splits Yes The training, validation, and test sets consist of 35764, 5133, and 10265 shapes.
Hardware Specification Yes The pipeline is trained with one NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions software like Pytorch3D, Pytorch libraries, Vi T-B (TIMM library), and Deep Lab V3, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For the 2D backbone C, we use Vi T-B (Dosovitskiy et al., 2021) (with pretrained weights from TIMM library (Wightman, 2019)) for classification and Deep Lab V3 (Chen et al., 2018) for segmentation. The feature dimension of the Voint Net architectures is d = 64, and the depth is l V = 4 layers in h V. We train our pipeline in two stages, where we start by training the 2D backbone on the 2D projected labels of the points, then train the entire pipeline end-to-end while focusing the training on the Voint Net part. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with an initial learning rate of 0.0005 and a step learning rate schedule of 33.3% every 12 epochs for 40 epochs.