Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective

Authors: Jiaxiang Tang, Xiaokang Chen, Jingbo Wang, Gang Zeng2352-2360

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our model surpasses state-of-the-arts on two benchmarks by a large margin, with only depth images as the input. ... We evaluate the proposed method and compare it with state-of-the-art methods on two public datasets, NYU and NYUCAD. ... We provide some visualizations on the NYUCAD dataset in Figure 5. ... We conduct ablation studies to verify the effectiveness of each aforementioned component. All the results are tested on the NYUCAD test set.
Researcher Affiliation Academia Key Lab. of Machine Perception (Mo E), School of AI, Peking University, Chinese University of Hong Kong
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes The NYU dataset (Silberman et al. 2012) consists of 1, 449 realistic indoor RGB-D scenes captured via a Kinect sensor (Song et al. 2017). ... To solve this problem, the high-quality synthetic NYUCAD dataset is proposed by (Firman et al. 2016), where the depth maps are projected from the ground truth annotations and thus avoid the misalignments.
Dataset Splits Yes Following previous works (Song et al. 2017; Chen et al. 2020a; Garbade et al. 2019), we choose NYU and NYUCAD to evaluate our method. ... All the results are tested on the NYUCAD test set.
Hardware Specification Yes We use the Py Torch framework with two Nvidia Titan Xp GPUs to conduct our experiments.
Software Dependencies No The paper mentions using "Py Torch framework" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Mini-batch SGD with momentum of 0.9 is adopted to train our network. The initial learning rate is 0.05, batch size is 8 and the weight decay is 0.0005. We employ a Poly learning rate decay policy where the initial learning rate is multiplied by (1 now_iter / max_iter)^0.9. We train our network for 1000 epochs on the NYUCAD dataset and the NYU dataset. The radius of the ellipsoidal receptive field is set to 0.09 for the major axis and 0.03 for the minor axes. The balancing factor λ in Equation 7 is set to 0.5.