Self-supervised Monocular Depth and Visual Odometry Learning with Scale-consistent Geometric Constraints

Authors: Mingkang Xiong, Zhenghong Zhang, Weilin Zhong, Jinsheng Ji, Jiyuan Liu, Huilin Xiong

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on KITTI and Make3D datasets demonstrate that, i) by incorporating the proposed constraints as supervision, the depth estimation model can achieve state-of-the-art (SOTA) performance among the self-supervised methods, and ii) it is effective to use the proposed training framework to obtain a uniform global scale VO model.
Researcher Affiliation Academia Shanghai Key Laboratory of Intelligent Sensing and Recognition, Shanghai Jiao Tong University, Shanghai, China {mkxiong, art zzh, zhongweilin, jinshengji, liujiyuan, hlxiong}@sjtu.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Our models are mainly trained on KITTI datasets [Geiger et al., 2012]. For monocular depth estimation, we use Eigen split [Eigen et al., 2014] of Raw data for a fair comparison with previous methods. The split selects 697 images as test datasets for monocular depth estimation and the others are applied for training. The ground truth depth maps of test datasets are obtained from lidar sensors. The original image size is 1242 375 and we resize it as 416 128 or 832 256 to formulate training datasets. For pose estimation, we train our networks on the KITTI Odometry dataset [Geiger et al., 2012], which contains 11 sequences with public ground truth poses. The sequences 00-08 are utilized for training and 09-10 are test sets. The ground truth poses are not used in our training framework. ... The Make3D datasets [Saxena et al., 2009] are used to evaluate the generalization ability of the depth estimation model. ...134 images are used for evaluation.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, specifically lacking details for a distinct validation set.
Hardware Specification Yes We use PyTorch [Paszke et al., 2019] to implement our framework and train it with a TITAN XP GPU.
Software Dependencies No The paper mentions 'PyTorch [Paszke et al., 2019]' but does not provide specific version numbers for PyTorch or any other software dependencies crucial for replication.
Experiment Setup Yes Adam optimizer is adopted and parameters are set as β1 = 0.9 and β2 = 0.999. In Eq. 12 and 13, the combination [λph, λS, λsm, λd, λpo] = [0.15, 0.85, 0.1, 0.001, 0.1] is used. We utilize three sequential frames as training samples and batch size is set to 4. The learning rate is set as 10 4. Our models are trained in 100 epochs and we randomly select 1000 samples in every epoch. We pre-train the model on Cityscapes and then finetune KITTI datasets. The data is augmented with random brightness, contrast, and saturation.