Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
Authors: Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluation results demonstrate that our depth estimator achieves the state-of-the-art performance on the KITTI dataset. Moreover, we show that our ego-motion network is able to predict a globally scale-consistent camera trajectory for long video sequences, and the resulting visual odometry accuracy is competitive with the recent model that is trained using stereo videos. We conduct detailed ablation studies that clearly demonstrate the efficacy of the proposed approach. |
| Researcher Affiliation | Collaboration | Jia-Wang Bian1,2, Zhichao Li3, Naiyan Wang3, Huangying Zhan1,2 Chunhua Shen1,2, Ming-Ming Cheng4, Ian Reid1,2 1University of Adelaide, Australia 2Australian Centre for Robotic Vision, Australia 3Tu Simple, China 4Nankai University, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code related to the described methodology. |
| Open Datasets | Yes | For depth network, we train and test models on KITTI raw dataset [15] using Eigen [5] s split that is the same with related works [10, 9, 11, 7]. Following [7], we use a snippet of three sequential video frames as a training sample... Also, we pre-train the network on City Scapes [30] and finetune on KITTI [15], each for 200 epochs. For pose network, following Zhan et al. [17], we evaluate visual odometry results on KITTI odometry dataset [15], where sequence 00-08/09-10 are used for training/testing. |
| Dataset Splits | Yes | For depth network, we train and test models on KITTI raw dataset [15] using Eigen [5] s split that is the same with related works [10, 9, 11, 7]. Following [7], we use a snippet of three sequential video frames as a training sample... Also, we pre-train the network on City Scapes [30] and finetune on KITTI [15], each for 200 epochs. For pose network, following Zhan et al. [17], we evaluate visual odometry results on KITTI odometry dataset [15], where sequence 00-08/09-10 are used for training/testing. |
| Hardware Specification | Yes | We compare with CC [11], and both methods are trained on a single 16GB Tesla V100 GPU. |
| Software Dependencies | No | The proposed learning framework is implemented using Py Torch Library [28]. This mentions PyTorch but does not specify a version number, and no other software dependencies are listed with versions. |
| Experiment Setup | Yes | We use ADAM [29] optimizer, and set the batch size to 4 and the learning rate to 10 4. During training, we adopt α = 1.0, β = 0.1, and γ = 0.5 in Eqn. 1. We train the network in 200 epochs with 1000 randomly sampled batches in one epoch, and validate the model at per epoch. Also, we pre-train the network on City Scapes [30] and finetune on KITTI [15], each for 200 epochs. Here we follow Eigen et al. [5] s evaluation metrics for depth evaluation. |