DeepV2D: Video to Depth with Differentiable Structure from Motion
Authors: Zachary Teed, Jia Deng
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we demonstrate the effectiveness of Deep V2D across a variety of datasets and tasks, and outperform strong methods such as Deep TAM (Zhou et al., 2018), De Mo N (Ummenhofer et al., 2017), BANet (Tang & Tan, 2018), and MVSNet (Yao et al., 2018). |
| Researcher Affiliation | Academia | Zachary Teed Princeton University zteed@cs.princeton.edu Jia Deng Princeton University jiadeng@cs.princeton.edu |
| Pseudocode | No | The paper provides figures illustrating network architectures but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available https://github.com/princeton-vl/Deep V2D. |
| Open Datasets | Yes | Our primary experiments are on NYU (Silberman et al., 2012), Scan Net (Dai et al., 2017), SUN3D (Xiao et al., 2013), and KITTI (Geiger et al., 2013)... |
| Dataset Splits | Yes | We experiment on NYU using the standard train/test split (Eigen et al., 2014)... We use the train/test split proposed by Tang & Tan (2018) [for ScanNet]... We follow the Eigen train/test split (Eigen et al., 2014) [for KITTI]... |
| Hardware Specification | No | The paper mentions 'Peak GPU Memory' usage in Table 6 but does not specify the exact GPU model, CPU, or any other hardware components used for the experiments. |
| Software Dependencies | No | Deep V2D is implemented in Tensorflow (Abadi et al., 2016). (TensorFlow is mentioned but specific version number is not provided, only the publication year of the framework). |
| Experiment Setup | Yes | When training on NYU and Scan Net, we train with 4 frame video clips. On KITTI, we use 5 frame video clips... Stage I: We train the Motion Module using the Lmotion loss with RMSProp (Tieleman & Hinton, 2012) and a learning rate of 0.0001... Stage II: ... The initial learning rate is set to .001 and decayed to .0002 after 100k training steps. ...We train Stage II for a total of 120k iterations with a batch size of 2. ...We perform data augmentation by adjusting brightness, gamma, and performing random scaling of the image channels. We also randomly perturb the input camera pose to the Motion Module by sampling small perturbations. |