reproducibilityindex.ai

Self-supervised Monocular Depth and Visual Odometry Learning with Scale-consistent Geometric Constraints

Authors: Mingkang Xiong, Zhenghong Zhang, Weilin Zhong, Jinsheng Ji, Jiyuan Liu, Huilin Xiong

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on KITTI and Make3D datasets demonstrate that, i) by incorporating the proposed constraints as supervision, the depth estimation model can achieve state-of-the-art (SOTA) performance among the self-supervised methods, and ii) it is effective to use the proposed training framework to obtain a uniform global scale VO model.
Researcher Affiliation	Academia	Shanghai Key Laboratory of Intelligent Sensing and Recognition, Shanghai Jiao Tong University, Shanghai, China {mkxiong, art zzh, zhongweilin, jinshengji, liujiyuan, hlxiong}@sjtu.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	Our models are mainly trained on KITTI datasets [Geiger et al., 2012]. For monocular depth estimation, we use Eigen split [Eigen et al., 2014] of Raw data for a fair comparison with previous methods. The split selects 697 images as test datasets for monocular depth estimation and the others are applied for training. The ground truth depth maps of test datasets are obtained from lidar sensors. The original image size is 1242 375 and we resize it as 416 128 or 832 256 to formulate training datasets. For pose estimation, we train our networks on the KITTI Odometry dataset [Geiger et al., 2012], which contains 11 sequences with public ground truth poses. The sequences 00-08 are utilized for training and 09-10 are test sets. The ground truth poses are not used in our training framework. ... The Make3D datasets [Saxena et al., 2009] are used to evaluate the generalization ability of the depth estimation model. ...134 images are used for evaluation.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits, specifically lacking details for a distinct validation set.
Hardware Specification	Yes	We use PyTorch [Paszke et al., 2019] to implement our framework and train it with a TITAN XP GPU.
Software Dependencies	No	The paper mentions 'PyTorch [Paszke et al., 2019]' but does not provide specific version numbers for PyTorch or any other software dependencies crucial for replication.
Experiment Setup	Yes	Adam optimizer is adopted and parameters are set as β1 = 0.9 and β2 = 0.999. In Eq. 12 and 13, the combination [λph, λS, λsm, λd, λpo] = [0.15, 0.85, 0.1, 0.001, 0.1] is used. We utilize three sequential frames as training samples and batch size is set to 4. The learning rate is set as 10 4. Our models are trained in 100 epochs and we randomly select 1000 samples in every epoch. We pre-train the model on Cityscapes and then ﬁnetune KITTI datasets. The data is augmented with random brightness, contrast, and saturation.