Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Authors: Siqi Li, Changqing Zou, Yipeng Li, Xibin Zhao, Yue Gao11402-11409

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
Researcher Affiliation Collaboration Siqi Li,1 Changqing Zou,2 Yipeng Li,3 Xibin Zhao,1 Yue Gao1 1BNRist, KLISS, School of Software, Tsinghua University, China 2Huawei Noah s Ark Lab, 3Department of Automation, Tsinghua University, China
Pseudocode No The paper describes the network architecture and process flow in detail with text and diagrams (Figure 2 and 3), but it does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset. The NYUv2 (Silberman et al. 2012) is a real scene dataset, consisting of 1449 indoor scenes. SUNCGRGBD, a synthetic dataset proposed by Liu et al. (Liu et. al. 2018), is a subset of the SUNCG dataset (Song et al. 2017).
Dataset Splits Yes The NYUv2 (Silberman et al. 2012) is a real scene dataset, consisting of 1449 indoor scenes. The dataset is divided into 795 training and 654 testing samples, each scene associated with RGB-D images. SUNCGRGBD [...] It consists of 13011 training samples and 499 testing samples.
Hardware Specification No The paper does not specify the hardware used for running the experiments, such as particular GPU or CPU models, or memory configurations.
Software Dependencies No The paper mentions using a 'cross-entropy loss' and 'SGD optimizer' with specific parameters, but it does not list any specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed for replication.
Experiment Setup Yes The training procedure consists of two steps. We first pre-train the 2D segmentation network with the supervision of 2D semantic segmentation ground truth, and then train the whole model end-to-end. We use cross-entropy loss and an SGD optimizer with a momentum of 0.9, a weight decay of 5e-4, and a batch size of 1. The learning rate of the 2D segmentation network and 3D scene completion network is 0.001 and 0.01, respectively.