Structure-Aware Residual Pyramid Network for Monocular Depth Estimation
Authors: Xiaotian Chen, Xuejin Chen, Zheng-Jun Zha
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results on the challenging NYU-Depth v2 dataset demonstrate that our proposed approach achieves state-of-the-art performance in both qualitative and quantitative evaluation. 4 Experiments To demonstrate the effectiveness of the proposed approach, we evaluate our approach on the challenging NYUD v2 dataset [Silberman et al., 2012]. We compare our approach with a couple of state-of-the-art approaches and show the superiority of the proposed method on both quantitative and qualitative evaluations. |
| Researcher Affiliation | Academia | Xiaotian Chen , Xuejin Chen and Zheng-Jun Zha National Engineering Laboratory for Brain-inspired Intelligence Technology and Application University of Science and Technology of China ustcxt@mail.ustc.edu.cn, {xjchen99, zhazj}@ustc.edu.cn |
| Pseudocode | No | The paper describes the network architecture with diagrams (Figure 2, Figure 3) and textual explanations, but it does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Xt-Chen/SARPN. |
| Open Datasets | Yes | To demonstrate the effectiveness of the proposed approach, we evaluate our approach on the challenging NYUD v2 dataset [Silberman et al., 2012]. The NYU-Depth v2 dataset [Silberman et al., 2012] contains 464 video sequences of indoor scenes captured with Microsoft Kinect. |
| Dataset Splits | No | To training our network, we use the training dataset which contains 50K RGBD images, select and then augment in the same way as [Hu et al., 2019]. Each image is downsampled to 320 x 240 using bilinear interpolation, and then center-cropped to 304 x 228. The paper mentions a training dataset and a testing dataset but does not explicitly specify a separate validation dataset split with percentages or counts. |
| Hardware Specification | No | The paper mentions implementing the model using PyTorch but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running experiments. |
| Software Dependencies | No | The paper states 'We implement the proposed model using Py Torch [Paszke et al., 2017]' but does not specify the version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Each image is downsampled to 320 x 240 using bilinear interpolation, and then center-cropped to 304 x 228. The predicted depth maps are in a resolution of 152 x 114. For testing, the predicted depth maps are upsampled to match the size of the corresponding ground truth using bilinear interpolation. We implement the proposed model using Py Torch [Paszke et al., 2017]. The encoder, SENet, is initialized by a model pretrained on Image Net [Deng et al., 2009]. The other layers in our network are randomly initialized. We use a step learning rate decay policy with Adam optimizer, and starting from an initial learning rate of linit = 10^-4. It is reduced to 10% every 5 epochs. We use β1 = 0.9, β2 = 0.999, and weight decay as 10^-4. The proposed network was trained for 20 epochs with a batch size of 6. |