reproducibilityindex.ai

Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding

Authors: Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our novel representation achieves state-of-the-artperformance on 3D classification, shape retrieval, and robust 3D part segmentation on standard benchmarks ( Scan Object NN, Shape Net Core55, and Shape Net Parts). 4 Experiments
Researcher Affiliation	Academia	King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia {abdullah.hamdi, silvio.giancola, bernard.ghanem}@kaust.edu.sa
Pseudocode	No	The paper describes operations like Voint Max and Voint Conv in mathematical and textual form but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/ajhamdi/vointcloud
Open Datasets	Yes	We benchmark Voint Net on the challenging and realistic Scan Object NN dataset for 3D point cloud classification (Uy et al., 2019). For the shape retrieval task, we benchmark on Shape Net Core55 as a subset of Shape Net (Chang et al., 2015). On the other hand, for the task of shape part segmentation, we test on Shape Net Parts (Yi et al., 2016), a subset of Shape Net (Chang et al., 2015). For occlusion robustness, we follow MVTN (Hamdi et al., 2021) and test on Model Net40 (Wu et al., 2015).
Dataset Splits	Yes	The training, validation, and test sets consist of 35764, 5133, and 10265 shapes.
Hardware Specification	Yes	The pipeline is trained with one NVIDIA Tesla V100 GPU.
Software Dependencies	No	The paper mentions software like Pytorch3D, Pytorch libraries, Vi T-B (TIMM library), and Deep Lab V3, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	For the 2D backbone C, we use Vi T-B (Dosovitskiy et al., 2021) (with pretrained weights from TIMM library (Wightman, 2019)) for classification and Deep Lab V3 (Chen et al., 2018) for segmentation. The feature dimension of the Voint Net architectures is d = 64, and the depth is l V = 4 layers in h V. We train our pipeline in two stages, where we start by training the 2D backbone on the 2D projected labels of the points, then train the entire pipeline end-to-end while focusing the training on the Voint Net part. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with an initial learning rate of 0.0005 and a step learning rate schedule of 33.3% every 12 epochs for 40 epochs.