Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency
Authors: Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon1863-1872
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are publicly available. |
| Researcher Affiliation | Collaboration | Seokju Lee1, Sunghoon Im2, Stephen Lin3, In So Kweon1 1 Korea Advanced Institute of Science and Technology (KAIST) 2 Daegu Gyeongbuk Institute of Science and Technology (DGIST) 3 Microsoft Research |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code, dataset, and models are publicly available.2 2https://github.com/SeokjuLee/Insta-DM |
| Open Datasets | Yes | Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are publicly available. |
| Dataset Splits | No | The paper mentions using the "KITTI Eigen split" and "Cityscapes dataset" for testing, but it does not provide explicit details within the text about the train/validation splits or their sizes/percentages for general reproducibility from scratch, only referencing standard benchmarks. |
| Hardware Specification | Yes | We train our networks using the ADAM optimizer (Kingma and Ba 2015) with β1 = 0.9 and β2 = 0.999 on 4 Nvidia RTX 2080 GPUs. |
| Software Dependencies | No | The paper states "Our system is implemented in Py Torch (Paszke et al. 2019)" but does not specify the version number for PyTorch or any other software libraries or dependencies. The reference year (2019) is not a version number. |
| Experiment Setup | Yes | The image resolution is set to 832 256 and the video data is augmented with random scaling, cropping, and horizontal flipping. We set the mini-batch size to 4 and train the networks over 200 epochs with 1,000 randomly sampled batches in each epoch... The initial learning rate is set to 10 4 and is decreased by half every 50 epochs. The loss weights are set to λp = 2.0, λg = 1.0, λs = 0.1, λt = 0.1, and λh = 0.02. |