Deep Digging into the Generalization of Self-Supervised Monocular Depth Estimation

Authors: Jinwoo Bae, Sungho Moon, Sunghoon Im

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first evaluate state-of-the-art models on diverse public datasets, which have never been seen during the network training. Next, we investigate the effects of texturebiased and shape-biased representations using the various texture-shifted datasets that we generated. Extensive experiments show that the proposed method achieves state-of-the-art performance with various public datasets. Our method also shows the best generalization ability among the competitive methods.
Researcher Affiliation Academia Jinwoo Bae1, Sungho Moon1, and Sunghoon Im1 1 Department of Electrical Engineering and Computer Science, DGIST, Daegu, Korea
Pseudocode No The paper provides architectural diagrams and mathematical formulations for its components, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes We evaluate state-of-the-art models trained on KITTI using six public depth datasets (SUN3D, RGBD, MVS, Scenes11, ETH3D, and Oxford Robotcar). We use the KITTI Eigen split (Geiger et al. 2013; Eigen and Fergus 2015) consisting of 39,810 training, and 4,424 validation and 697 test data. We test the models using public depth datasets consisting of indoor scenes (SUN3D (Xiao, Owens, and Torralba 2013), RGBD (Sturm et al. 2012)), synthetic scenes from graphics tools (Scenes11 (Ummenhofer et al. 2017)), outdoor buildingfocused scenes (MVS (Ummenhofer et al. 2017)), and night driving scenes (Oxford Robotcar (Maddern et al. 2016)). We also use ETH3D (Schops et al. 2017) containing both indoor and outdoor scenes.
Dataset Splits Yes We use the KITTI Eigen split (Geiger et al. 2013; Eigen and Fergus 2015) consisting of 39,810 training, and 4,424 validation and 697 test data.
Hardware Specification No The paper describes the datasets used and experimental settings, but does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used to run the experiments.
Software Dependencies No The paper mentions various models and architectures (e.g., 'ResNet50,' 'Transformers,' 'ViT'), but it does not specify any software dependencies (like programming languages, libraries, or frameworks) with their version numbers required to reproduce the experiments.
Experiment Setup Yes We use an input image size of 640 x 192. We use ResNet50 (He et al. 2016) as the CNN backbone (E(θ) in Fig. 1), and L number of Transformers. In this work, we set the L as 4.