reproducibilityindex.ai

Depth-Relative Self Attention for Monocular Depth Estimation

Authors: Kyuhong Shim, Jiyoung Kim, Gusang Lee, Byonghyo Shim

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information. In addition, we propose a novel monocular depth estimation benchmark that limits the observable depth range during training in order to evaluate the robustness of the model for unseen depths.
Researcher Affiliation	Academia	Kyuhong Shim , Jiyoung Kim , Gusang Lee and Byonghyo Shim Department of Electrical and Computer Engineering, Seoul National University, Korea {khshim, jykim, gslee, bshim}@islab.snu.ac.kr
Pseudocode	No	The paper describes the architecture and process of RED-T, including 'The detailed process of each cycle is as follows:', but it does not present any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	NYU-v2 [Silberman et al., 2012] dataset includes pairs of RGB images and depth maps on 464 indoor scenes, which are separated into 120K training samples from 249 scenes and 654 testing samples from 215 scenes. The range of depth labels is up to 10 meters. We train our model on 50K subset following previous work [Yuan et al., 2022]. KITTI [Geiger et al., 2013] dataset consists of paired RGB images and corresponding depth maps obtained by a 3D laser scanner on 61 outdoor scenes while driving. The range of depth annotations is up to 80 meters.
Dataset Splits	Yes	First, following the Eigen split setting [Eigen et al., 2014], we train our model with about 26K samples from 32 scenes and test on 687 samples from 29 scenes. Second, for the online depth prediction configuration [Geiger et al., 2012], we use 72K training samples, 6K validation samples, and 500 testing samples.
Hardware Specification	Yes	We train our model with a batch size of 16 for 24 epochs on 8 NVIDIA A5000 24GB GPUs.
Software Dependencies	No	The paper mentions using 'Adam W optimizer [Kingma and Ba, 2014]' and 'Swin Transformer (Swin) [Liu et al., 2021]' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	We use Adam W optimizer [Kingma and Ba, 2014] with a learning rate of 1e-4, (β1, β2) of (0.9, 0.999), and a weight decay of 0.1. The learning rate starts at 4e-6, increases to the maximum value for 25% of the total iterations, and then decreases to 1e-6. We train our model with a batch size of 16 for 24 epochs on 8 NVIDIA A5000 24GB GPUs. The gradient is accumulated every 2 batches and clipped to the maximum gradient norm of 0.1.