Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

Authors: Xiaotian Chen, Xuejin Chen, Zheng-Jun Zha

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results on the challenging NYU-Depth v2 dataset demonstrate that our proposed approach achieves state-of-the-art performance in both qualitative and quantitative evaluation. 4 Experiments To demonstrate the effectiveness of the proposed approach, we evaluate our approach on the challenging NYUD v2 dataset [Silberman et al., 2012]. We compare our approach with a couple of state-of-the-art approaches and show the superiority of the proposed method on both quantitative and qualitative evaluations.
Researcher Affiliation Academia Xiaotian Chen , Xuejin Chen and Zheng-Jun Zha National Engineering Laboratory for Brain-inspired Intelligence Technology and Application University of Science and Technology of China ustcxt@mail.ustc.edu.cn, {xjchen99, zhazj}@ustc.edu.cn
Pseudocode No The paper describes the network architecture with diagrams (Figure 2, Figure 3) and textual explanations, but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Xt-Chen/SARPN.
Open Datasets Yes To demonstrate the effectiveness of the proposed approach, we evaluate our approach on the challenging NYUD v2 dataset [Silberman et al., 2012]. The NYU-Depth v2 dataset [Silberman et al., 2012] contains 464 video sequences of indoor scenes captured with Microsoft Kinect.
Dataset Splits No To training our network, we use the training dataset which contains 50K RGBD images, select and then augment in the same way as [Hu et al., 2019]. Each image is downsampled to 320 x 240 using bilinear interpolation, and then center-cropped to 304 x 228. The paper mentions a training dataset and a testing dataset but does not explicitly specify a separate validation dataset split with percentages or counts.
Hardware Specification No The paper mentions implementing the model using PyTorch but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running experiments.
Software Dependencies No The paper states 'We implement the proposed model using Py Torch [Paszke et al., 2017]' but does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup Yes Each image is downsampled to 320 x 240 using bilinear interpolation, and then center-cropped to 304 x 228. The predicted depth maps are in a resolution of 152 x 114. For testing, the predicted depth maps are upsampled to match the size of the corresponding ground truth using bilinear interpolation. We implement the proposed model using Py Torch [Paszke et al., 2017]. The encoder, SENet, is initialized by a model pretrained on Image Net [Deng et al., 2009]. The other layers in our network are randomly initialized. We use a step learning rate decay policy with Adam optimizer, and starting from an initial learning rate of linit = 10^-4. It is reduced to 10% every 5 epochs. We use β1 = 0.9, β2 = 0.999, and weight decay as 10^-4. The proposed network was trained for 20 epochs with a batch size of 6.