Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions

Authors: Ayan Chakrabarti, Jingyu Shao, Greg Shakhnarovich

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the NYUv2 depth data set [11], and find that it achieves state-of-the-art performance. We train and evaluate our method on the NYU v2 depth dataset [11]. To construct our training and validation sets, we adopt the standard practice of using the raw videos corresponding to the training images from the official train/test split. We randomly select 10% of these videos for validation, and use the rest for training our network. Table 2 reports the quantitative performance of our method, along with other state-of-the-art approaches over the entire test set, and we find that the proposed method yields superior performance on most metrics.
Researcher Affiliation Academia Ayan Chakrabarti TTI-Chicago Chicago, IL ayanc@ttic.edu Jingyu Shao Dept. of Statistics, UCLA Los Angeles, CA shaojy15@ucla.edu Gregory Shakhnarovich TTI-Chicago Chicago, IL gregory@ttic.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code for implementation, along with a pre-trained network model, are available at http://www.ttic.edu/chakrabarti/mdepth.
Open Datasets Yes We train and evaluate our method on the NYU v2 depth dataset [11].
Dataset Splits Yes To construct our training and validation sets, we adopt the standard practice of using the raw videos corresponding to the training images from the official train/test split. We randomly select 10% of these videos for validation, and use the rest for training our network. Our training set is formed by sub-sampling video frames uniformly, and consists of roughly 56,000 color image-depth map pairs.
Hardware Specification Yes Our overall inference method (network predictions and globalization) takes 24 seconds per-image when using an NVIDIA Titan X GPU. AC and GS thank NVIDIA Corporation for donations of Titan X GPUs used in this research.
Software Dependencies No The paper mentions using the VGG-19 network and ReLU activations, but does not provide specific version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup Yes We use a fully convolutional version of our architecture during training with a stride of 8 pixels, yielding nearly 4000 training patches per image. We train the network using SGD for a total of 14 epochs, using a batch size of only one image and a momentum value of 0.9. We begin with a learning rate of 0.01, and reduce it after the 4th, 8th, 10th, 12th, and 13th epochs, each time by a factor of two.