reproducibilityindex.ai

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

Authors: Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTAL RESULTS: We use the standard KITTI benchmark (Geiger et al., 2013) for self-supervised training and evaluation.
Researcher Affiliation	Collaboration	1Toyota Research Institute (TRI) 2University of Michigan {first.last}@tri.global rayhou@umich.edu
Pseudocode	No	The paper describes its methods in prose and diagrams, but no structured pseudocode or algorithm blocks are explicitly present.
Open Source Code	Yes	Source code and pretrained models are available on https://github.com/TRI-ML/packnet-sfm
Open Datasets	Yes	We use the standard KITTI benchmark (Geiger et al., 2013) for self-supervised training and evaluation. ... Following common practice, we pretrain our depth and pose networks on the City Scapes dataset (Cordts et al., 2016), consisting of 88250 unlabeled images.
Dataset Splits	Yes	This results in 39810 images for training, 4424 for validation, and 697 for evaluation.
Hardware Specification	No	The paper mentions training with a 'batch size of 4 per GPU' but does not provide specific details on the GPU models, CPU models, or any other hardware specifications used for experiments.
Software Dependencies	No	The paper states 'We implement our models with Py Torch (Paszke et al., 2017)' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The initial training stage is conducted on the City Scapes dataset for 50 epochs, with a batch size of 4 per GPU and initial depth and pose learning rates of 2 10 4 and 5 10 4 respectively, that are halved every 20 epochs. Afterwards, the depth and pose networks are ﬁne-tuned on KITTI for 30 epochs, with the same parameters and halving the learning rates after every 12 epochs. ... we use a Res Net-50 backbone with Imagenet (Deng et al., 2009) pretrained weights and optimize the network for 48k iterations on the City Scapes dataset with a learning rate of 0.01, momentum of 0.9, weight decay of 10 4, and a batch size of 1 per GPU. Random scaling between (0.7, 1.3), random horizontal ﬂipping, and a crop size of 1000 2000 are used for data augmentation. We decay the learning rate by a factor of 10 at iterations 36k and 44k.