Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding
Authors: Abdullah Hamdi, Silvio Giancola, Bernard Ghanem
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our novel representation achieves state-of-the-artperformance on 3D classification, shape retrieval, and robust 3D part segmentation on standard benchmarks ( Scan Object NN, Shape Net Core55, and Shape Net Parts). 4 Experiments |
| Researcher Affiliation | Academia | King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia {abdullah.hamdi, silvio.giancola, bernard.ghanem}@kaust.edu.sa |
| Pseudocode | No | The paper describes operations like Voint Max and Voint Conv in mathematical and textual form but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/ajhamdi/vointcloud |
| Open Datasets | Yes | We benchmark Voint Net on the challenging and realistic Scan Object NN dataset for 3D point cloud classification (Uy et al., 2019). For the shape retrieval task, we benchmark on Shape Net Core55 as a subset of Shape Net (Chang et al., 2015). On the other hand, for the task of shape part segmentation, we test on Shape Net Parts (Yi et al., 2016), a subset of Shape Net (Chang et al., 2015). For occlusion robustness, we follow MVTN (Hamdi et al., 2021) and test on Model Net40 (Wu et al., 2015). |
| Dataset Splits | Yes | The training, validation, and test sets consist of 35764, 5133, and 10265 shapes. |
| Hardware Specification | Yes | The pipeline is trained with one NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions software like Pytorch3D, Pytorch libraries, Vi T-B (TIMM library), and Deep Lab V3, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For the 2D backbone C, we use Vi T-B (Dosovitskiy et al., 2021) (with pretrained weights from TIMM library (Wightman, 2019)) for classification and Deep Lab V3 (Chen et al., 2018) for segmentation. The feature dimension of the Voint Net architectures is d = 64, and the depth is l V = 4 layers in h V. We train our pipeline in two stages, where we start by training the 2D backbone on the 2D projected labels of the points, then train the entire pipeline end-to-end while focusing the training on the Voint Net part. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with an initial learning rate of 0.0005 and a step learning rate schedule of 33.3% every 12 epochs for 40 epochs. |