reproducibilityindex.ai

DeepV2D: Video to Depth with Differentiable Structure from Motion

Authors: Zachary Teed, Jia Deng

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we demonstrate the effectiveness of Deep V2D across a variety of datasets and tasks, and outperform strong methods such as Deep TAM (Zhou et al., 2018), De Mo N (Ummenhofer et al., 2017), BANet (Tang & Tan, 2018), and MVSNet (Yao et al., 2018).
Researcher Affiliation	Academia	Zachary Teed Princeton University zteed@cs.princeton.edu Jia Deng Princeton University jiadeng@cs.princeton.edu
Pseudocode	No	The paper provides figures illustrating network architectures but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available https://github.com/princeton-vl/Deep V2D.
Open Datasets	Yes	Our primary experiments are on NYU (Silberman et al., 2012), Scan Net (Dai et al., 2017), SUN3D (Xiao et al., 2013), and KITTI (Geiger et al., 2013)...
Dataset Splits	Yes	We experiment on NYU using the standard train/test split (Eigen et al., 2014)... We use the train/test split proposed by Tang & Tan (2018) [for ScanNet]... We follow the Eigen train/test split (Eigen et al., 2014) [for KITTI]...
Hardware Specification	No	The paper mentions 'Peak GPU Memory' usage in Table 6 but does not specify the exact GPU model, CPU, or any other hardware components used for the experiments.
Software Dependencies	No	Deep V2D is implemented in Tensorﬂow (Abadi et al., 2016). (TensorFlow is mentioned but specific version number is not provided, only the publication year of the framework).
Experiment Setup	Yes	When training on NYU and Scan Net, we train with 4 frame video clips. On KITTI, we use 5 frame video clips... Stage I: We train the Motion Module using the Lmotion loss with RMSProp (Tieleman & Hinton, 2012) and a learning rate of 0.0001... Stage II: ... The initial learning rate is set to .001 and decayed to .0002 after 100k training steps. ...We train Stage II for a total of 120k iterations with a batch size of 2. ...We perform data augmentation by adjusting brightness, gamma, and performing random scaling of the image channels. We also randomly perturb the input camera pose to the Motion Module by sampling small perturbations.