Patch-Wise Attention Network for Monocular Depth Estimation

Authors: Sihaeng Lee, Janghyeon Lee, Byungju Kim, Eojindl Yi, Junmo Kim1873-1881

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on two challenging datasets, KITTI and NYU Depth V2, demonstrate that the proposed method achieves significant performance. Furthermore, our method outperforms other state-of-the-art methods on the KITTI depth estimation benchmark. We conducted extensive experiments to compare our method with state-of-the-art approaches on two datasets: NYU Depth V2 (Silberman et al. 2012) and KITTI (Geiger et al. 2013).
Researcher Affiliation Collaboration Sihaeng Lee, 1 Janghyeon Lee, 2 Byungju Kim, 2,3 Eojindl Yi, 2 Junmo Kim 1,2 1 Division of Future Vehicle, KAIST, Daejeon, South Korea 2 School of Electrical Engineering, KAIST, Daejeon, South Korea 3 Mathpresso Inc, Seoul, South Korea
Pseudocode No The paper describes the proposed method using text and mathematical equations but does not include any pseudocode blocks or algorithms.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes We conducted extensive experiments to compare our method with state-of-the-art approaches on two datasets: NYU Depth V2 (Silberman et al. 2012) and KITTI (Geiger et al. 2013). We used Res Ne Xt101 (Xie et al. 2017), Dense Net161 (Huang et al. 2017), and Mobile Net V2 (Sandler et al. 2018), which were pretrained on image classification using the Image Net-1K dataset (Russakovsky et al. 2015), as the backbone networks.
Dataset Splits Yes The NYU Depth V2 dataset (Silberman et al. 2012) contains 464 indoor scenes, which contain 120K images and paired depth maps with a resolution of 640 480. As in the previous studies, we divided the dataset into 249 scenes for training and 215 scenes (654 images) for testing. In the experiments on the KITTI Eigen split, we followed the common data split proposed by Eigen, Puhrsch, and Fergus (2014) for comparison with previous studies. For the online KITTI depth prediction, we used the official benchmark split (Uhrig et al. 2017).
Hardware Specification No The paper mentions that 'All experiments were implemented on Py Torch (Paszke et al. 2019)' but does not specify any hardware details such as GPU models, CPU types, or memory used for these experiments.
Software Dependencies Yes All experiments were implemented on Py Torch (Paszke et al. 2019).
Experiment Setup Yes For the network training, we used a mini-batch size of 8 and 16 on NYU Depth V2 and KITTI dataset. We adopted the ADAM optimizer with β1 = 0.9, β2 = 0.999, and ϵ = 10 8. The learning rate started from 0.0005 on Mobile Net V2 and 0.0001 on Res Ne Xt101 and Dense Net161. We used a polynomial decay method with power p = 0.9 to schedule the learning rate, and we set the numbers of epochs for training our networks to 10. Randomly horizontal flipping was applied in all experiments. Thereafter, a random rotation of the inputs was applied in ranges of [-5, 5] and [-1, 1] for the NYU Depth V2 and KITTI datasets, respectively. Furthermore, we randomly changed the brightness, contrast, and saturation of the input images in the range of [0.8, 1.2], and the hue in the range of [0.9, 1.1]. All data augmentations were performed with a 50% probability.