Semantically-Guided Representation Learning for Self-Supervised Monocular Depth
Authors: Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTAL RESULTS: We use the standard KITTI benchmark (Geiger et al., 2013) for self-supervised training and evaluation. |
| Researcher Affiliation | Collaboration | 1Toyota Research Institute (TRI) 2University of Michigan {first.last}@tri.global rayhou@umich.edu |
| Pseudocode | No | The paper describes its methods in prose and diagrams, but no structured pseudocode or algorithm blocks are explicitly present. |
| Open Source Code | Yes | Source code and pretrained models are available on https://github.com/TRI-ML/packnet-sfm |
| Open Datasets | Yes | We use the standard KITTI benchmark (Geiger et al., 2013) for self-supervised training and evaluation. ... Following common practice, we pretrain our depth and pose networks on the City Scapes dataset (Cordts et al., 2016), consisting of 88250 unlabeled images. |
| Dataset Splits | Yes | This results in 39810 images for training, 4424 for validation, and 697 for evaluation. |
| Hardware Specification | No | The paper mentions training with a 'batch size of 4 per GPU' but does not provide specific details on the GPU models, CPU models, or any other hardware specifications used for experiments. |
| Software Dependencies | No | The paper states 'We implement our models with Py Torch (Paszke et al., 2017)' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The initial training stage is conducted on the City Scapes dataset for 50 epochs, with a batch size of 4 per GPU and initial depth and pose learning rates of 2 10 4 and 5 10 4 respectively, that are halved every 20 epochs. Afterwards, the depth and pose networks are fine-tuned on KITTI for 30 epochs, with the same parameters and halving the learning rates after every 12 epochs. ... we use a Res Net-50 backbone with Imagenet (Deng et al., 2009) pretrained weights and optimize the network for 48k iterations on the City Scapes dataset with a learning rate of 0.01, momentum of 0.9, weight decay of 10 4, and a batch size of 1 per GPU. Random scaling between (0.7, 1.3), random horizontal flipping, and a crop size of 1000 2000 are used for data augmentation. We decay the learning rate by a factor of 10 at iterations 36k and 44k. |